The Utility of Machine Learning Models for Predicting Chemical Contaminants in Drinking Water: Promise, Challenges, and Opportunities.

Hu, Xindi C; Dai, Mona; Sun, Jennifer M; Sunderland, Elsie M

Hu, Xindi C; Dai, Mona; Sun, Jennifer M; Sunderland, Elsie M.

Afiliación

Hu XC; Mathematica, Inc., 505 14Th St, #800, Oakland, CA, 94612, USA. chu@mathematica-mpr.com.
Dai M; Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, 02138, USA.
Sun JM; Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, 02138, USA.
Sunderland EM; Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, 02138, USA.

Curr Environ Health Rep ; 10(1): 45-60, 2023 03.

Article en En | MEDLINE | ID: mdl-36527604

RESUMEN

PURPOSE OF REVIEW: This review aims to better understand the utility of machine learning algorithms for predicting spatial patterns of contaminants in the United States (U.S.) drinking water. RECENT FINDINGS: We found 27 U.S. drinking water studies in the past ten years that used machine learning algorithms to predict water quality. Most studies (42%) developed random forest classification models for groundwater. Continuous models show low predictive power, suggesting that larger datasets and additional predictors are needed. Categorical/classification models for arsenic and nitrate that predict exceedances of pollution thresholds are most common in the literature because of good national scale data coverage and priority as environmental health concerns. Most groundwater data used to develop models were obtained from the United States Geological Survey (USGS) National Water Information System (NWIS). Predictors were similar across contaminants but challenges are posed by the lack of a standard methodology for imputation, pre-processing, and differing availability of data across regions. We reviewed 27 articles that focused on seven drinking water contaminants. Good performance metrics were reported for binary models that classified chemical concentrations above a threshold value by finding significant predictors. Classification models are especially useful for assisting in the design of sampling efforts by identifying high-risk areas. Only a few studies have developed continuous models and obtaining good predictive performance for such models is still challenging. Improving continuous models is important for potential future use in epidemiological studies to supplement data gaps in exposure assessments for drinking water contaminants. While significant progress has been made over the past decade, methodological advances are still needed for selecting appropriate model performance metrics and accounting for spatial autocorrelations in data. Finally, improved infrastructure for code and data sharing would spearhead more rapid advances in machine-learning models for drinking water quality.

Asunto(s)

Agua Potable; Agua Subterránea; Contaminantes Químicos del Agua; Estados Unidos; Humanos; Calidad del Agua; Nitratos/análisis; Aprendizaje Automático; Contaminantes Químicos del Agua/análisis; Monitoreo del Ambiente/métodos

Palabras clave

Drinking water; Health-based standards; Heavy metals; Machine learning; Risk prediction

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Contaminantes Químicos del Agua / Agua Potable / Agua Subterránea Tipo de estudio: Prognostic_studies / Risk_factors_studies Límite: Humans País/Región como asunto: America do norte Idioma: En Revista: Curr Environ Health Rep Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Suiza

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google