Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

NNadir

(34,841 posts)
Thu Nov 21, 2024, 11:08 PM Nov 21

The Use of AI for Adding Sense in Characterizing Environmental Microplastics and Trash.

The paper I'll briefly discuss in this post is this one: Microplastics and Trash Cleaning and Harmonization (MaTCH): Semantic Data Ingestion and Harmonization Using Artificial Intelligence Hannah Hapich, Win Cowger, and Andrew B. Gray Environmental Science & Technology 2024 58 (46), 20502-20512.

I don't have a lot of time on my hands right now; I'm behind on everything in my life, particularly as I'm losing sleep over the impending collapse of the United States, but in any case, the paper is open to the public for reading, I'll just briefly excerpt it, and show some pictures.

The ubiquity of microplastics and the vast number of papers on the subject are overwhelming, and in this paper, the authors have suggested a way to harmonize the data in order to make better sense of the literature.

The introduction states the problem:

Studies focused on trash (mismanaged waste > 5 mm in length (1)) and microplastic (plastics 1–5000 μm in length (1)) pollution have increased dramatically in number over recent years. (2) Together, these studies have indicated that the majority of microplastics found in the environment are secondary products of degrading mismanaged plastic waste rather than primary emissions, pointing to a relationship in environmental occurrence between trash and microplastics. (3) Microplastics and trash are diverse environmental pollutants that are difficult to query and quantify, as we generally describe them with incomparable categorical variables, and report environmental concentrations composed of varying reporting metrics and particle size ranges. (4) Microplastics data is not currently standardized and is therefore less easily or reliably comparable between studies, (5,6) leading to many calls for both standardization (6−9) and harmonization. (5,10,11) Nearly all propositions for standardization or harmonization have focused either on nano, micro, (5,9) or macro (8,12) particles, whose size domain thresholds are often arbitrary and inconsistent between groups. (13,14) Given the intrinsic relationship between trash and microplastics in their environmental occurrence and categorical semantics, it follows that data management strategies for microplastics and trash should be harmonious. (15)

Standards have been developed for managing mandated trash assessment data (8) and tabulating microplastics data with respect to specific reporting guidelines. (9) Such strategies serve as hubs for accumulating already standardized data and are often specific to certain geographic regions, study media, or government protocols. (8,10,16) Certain database structures may be better suited for data at the sample level (reported as a concentration) or the particle level (information reported for individual particles). Standardization is limited by the rate at which scientists, government organizations, nongovernmental organizations (NGOs), or industry adapt to such protocols. Additionally, most protocols do not have a strategy to utilize data that does not fit their standardized structures, which alienates potentially useful data. In these cases, users must perform data harmonization manually. It is particularly important in the field of microplastics monitoring to utilize existing databases due to the cost and time prohibitive nature of the field, wherein it commonly requires up to thousands of dollars and tens of hours to process a single sample, making data from each monitoring study highly valuable...


Another Excerpt:

...there exists no approach that automatically harmonizes macro debris and microplastics data from nonspecified formats or with unknown categorical descriptors. (2) Best practices for the development of database structures have remained a manual undertaking that should be performed with the input of a wide array of stakeholders, though the addition of new terms is prohibitively effort-intensive. An automated approach to data harmonization would allow for quick ingestion of data from new studies, leading to larger, more valuable databases.
The field of microplastics and trash is not the first to encounter such issues. Many divisions of the environmental and biological sciences have similar problems, which will worsen over time with ever-growing datasets and a focus on curating “big data” to identify knowledge gaps and answer key questions. (17) Previous work has assessed the use of natural language processing (NLP) algorithms as a means for information retrieval to assemble databases and organize their taxonomic structures. (18,19) Until recently, the technology available consisted of different pattern matching and syntactic/semantic parsing, some of which rely on extracting exact matches, and most have a narrow application range tailored to a specific subfield. (19) Results from early exploration of NLP for scientific data curation were discouraging (20) and may have led to underutilization.
NLP technology has vastly improved in accuracy and efficiency just over the past few years, primarily a result of increases in computing power and the development of open-source artificial intelligence (AI) software capable of employing transformers and embeddings. (21) Transformers are a type of neural network structure able to interpret data nonsequentially... (22)


Figures from the text:



The caption:

Figure 1. Microplastics and trash cleaning and harmonization (MaTCH) workflow schematic. Dark blue boxes represent data inputs and outputs. Light blue boxes represent model operations. Text outside of boxes and medium blue arrows represent decision trees.





The caption:

Figure 3. Change in morphological (a) and polymeric (b) categorical variables before and after semantic alignment. Illustration of hierarchical lumping via sunburst plot with 95% confidence intervals. (PP: polypropylene, PET: polyethylene terephthalate, PE: polyethylene, PA: polyamide, PS: polystyrene, PVC: polyvinyl chloride, and PPS: polyphenylene sulfide).





The caption:

Figure 4. Changes in morphological (a) and polymeric (b) composition of microplastics by count in rivers and drinking water before and after size rescaling. Co-polymers, nonpolymeric, and other plastics have been lumped via the taxonomy for visualization. (CA: cellulose acetate, PA: polyamide, PAI: polyamide-imides, PAM: polyacrylamide, PAN: polyacrylonitrile, PBA: polybutylene terephthalate, PE: polyethylene , PEST: polyester, PET: polyethylene terephthalate, PMMA: poly(methyl methacrylate), PP: polypropylene, PPE: polyphenylene ether, PPS: polyphenylene sulfide, PS: polystyrene, PTT: polytrimethylene terephthalate, PU: polyurethane, PVA: polyvinyl acetate, PVC: polyvinyl chloride, PVDF: polyvinylidene fluoride, and SBR: styrene–butadiene rubber).


The authors briefly discuss the limitations of their approach, and suggest future refinements on this work.


Latest Discussions»Issue Forums»Environment & Energy»The Use of AI for Adding ...