Your browser doesn't support javascript.
loading
Using machine learning models to estimate Escherichia coli concentration in an irrigation pond from water quality and drone-based RGB imagery data.
Hong, Seok Min; Morgan, Billie J; Stocker, Matthew D; Smith, Jaclyn E; Kim, Moon S; Cho, Kyung Hwa; Pachepsky, Yakov A.
Afiliação
  • Hong SM; USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA; Department of Civil Urban Earth and Environmental Engineering, Ulsan National Institute of Science and Technology, UNIST-gil 50, Ulsan, 44919, South Korea.
  • Morgan BJ; USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA.
  • Stocker MD; USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA.
  • Smith JE; USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA.
  • Kim MS; USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA.
  • Cho KH; School of Civil, Environmental and Architectural Engineering, Korea University, Seoul, 02841, South Korea. Electronic address: khcho80@korea.ac.kr.
  • Pachepsky YA; USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA. Electronic address: yakov.pachepsky@usda.gov.
Water Res ; 260: 121861, 2024 Aug 15.
Article em En | MEDLINE | ID: mdl-38875854
ABSTRACT
The rapid and efficient quantification of Escherichia coli concentrations is crucial for monitoring water quality. Remote sensing techniques and machine learning algorithms have been used to detect E. coli in water and estimate its concentrations. The application of these approaches, however, is challenged by limited sample availability and unbalanced water quality datasets. In this study, we estimated the E. coli concentration in an irrigation pond in Maryland, USA, during the summer season using demosaiced natural color (red, green, and blue RGB) imagery in the visible and infrared spectral ranges, and a set of 14 water quality parameters. We did this by deploying four machine learning models - Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGB), and K-nearest Neighbor (KNN) - under three data utilization scenarios water quality parameters only, combined water quality and small unmanned aircraft system (sUAS)-based RGB data, and RGB data only. To select the training and test datasets, we applied two data-splitting

methods:

ordinary and quantile data splitting. These methods provided a constant splitting ratio in each decile of the E. coli concentration distribution. Quantile data splitting resulted in better model performance metrics and smaller differences between the metrics for both the training and testing datasets. When trained with quantile data splitting after hyperparameter optimization, models RF, GBM, and XGB had R2 values above 0.847 for the training dataset and above 0.689 for the test dataset. The combination of water quality and RGB imagery data resulted in a higher R2 value (>0.896) for the test dataset. Shapley additive explanations (SHAP) of the relative importance of variables revealed that the visible blue spectrum intensity and water temperature were the most influential parameters in the RF model. Demosaiced RGB imagery served as a useful predictor of E. coli concentration in the studied irrigation pond.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Qualidade da Água / Lagoas / Escherichia coli / Irrigação Agrícola / Aprendizado de Máquina Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Qualidade da Água / Lagoas / Escherichia coli / Irrigação Agrícola / Aprendizado de Máquina Idioma: En Ano de publicação: 2024 Tipo de documento: Article