Systematic tracking of nitrogen sources in complex river catchments: Machine learning approach based on microbial metagenomics.
Water Res
; 253: 121255, 2024 Apr 01.
Article
em En
| MEDLINE
| ID: mdl-38341971
ABSTRACT
Tracking nitrogen pollution sources is crucial for the effective management of water quality; however, it is a challenging task due to the complex contaminative scenarios in the freshwater systems. The contaminative pattern variations can induce quick responses of aquatic microorganisms, making them sensitive indicators of pollution origins. In this study, the soil and water assessment tool, accompanied by a detailed pollution source database, was used to detect the main nitrogen pollution sources in each sub-basin of the Liuyang River watershed. Thus, each sub-basin was assigned to a known class according to SWAT outputs, including point source pollution-dominated area, crop cultivation pollution-dominated area, and the septic tank pollution-dominated area. Based on these outputs, the random forest (RF) model was developed to predict the main pollution sources from different river ecosystems using a series of input variable groups (e.g., natural macroscopic characteristics, river physicochemical properties, 16S rRNA microbial taxonomic composition, microbial metagenomic data containing taxonomic and functional information, and their combination). The accuracy and the Kappa coefficient were used as the performance metrics for the RF model. Compared with the prediction performance among all the input variable groups, the prediction performance of the RF model was significantly improved using metagenomic indices as inputs. Among the metagenomic data-based models, the combination of the taxonomic information with functional information of all the species achieved the highest accuracy (0.84) and increased median Kappa coefficient (0.70). Feature importance analysis was used to identify key features that could serve as indicators for sudden pollution accidents and contribute to the overall function of the river system. The bacteria Rhabdochromatium marinum, Frankia, Actinomycetia, and Competibacteraceae were the most important species, whose mean decrease Gini indices were 0.0023, 0.0021, 0.0019, and 0.0018, respectively, although their relative abundances ranged only from 0.0004 to 0.1 %. Among the top 30 important variables, functional variables constituted more than half, demonstrating the remarkable variation in the microbial functions among sites with distinct pollution sources and the key role of functionality in predicting pollution sources. Many functional indicators related to the metabolism of Mycobacterium tuberculosis, such as K24693, K25621, K16048, and K14952, emerged as significant important factors in distinguishing nitrogen pollution origins. With the shortage of pollution source data in developing regions, this suggested approach offers an economical, quick, and accurate solution to locate the origins of water nitrogen pollution using the metagenomic data of microbial communities.
Palavras-chave
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Contexto em Saúde:
2_ODS3
/
3_ND
Problema de saúde:
2_quimicos_contaminacion
/
3_tuberculosis
Assunto principal:
Poluentes Químicos da Água
/
Microbiota
Tipo de estudo:
Prognostic_studies
País/Região como assunto:
Asia
Idioma:
En
Revista:
Water Res
Ano de publicação:
2024
Tipo de documento:
Article