Your browser doesn't support javascript.
loading
Influence of resampling techniques on Bayesian network performance in predicting increased algal activity.
Zeinolabedini Rezaabad, Maryam; Lacey, Heather; Marshall, Lucy; Johnson, Fiona.
Afiliación
  • Zeinolabedini Rezaabad M; Water Research Centre, School of Civil and Environmental Engineering, University of New South Wales, Kensington, New South Wales, Australia; ARC Training Centre Data Analytics for Resources and Environments, School of Life and Environmental Sciences, The University of Sydney, Camperdown, New South W
  • Lacey H; WaterNSW, New South Wales, Australia.
  • Marshall L; Water Research Centre, School of Civil and Environmental Engineering, University of New South Wales, Kensington, New South Wales, Australia; ARC Training Centre Data Analytics for Resources and Environments, School of Life and Environmental Sciences, The University of Sydney, Camperdown, New South W
  • Johnson F; Water Research Centre, School of Civil and Environmental Engineering, University of New South Wales, Kensington, New South Wales, Australia; ARC Training Centre Data Analytics for Resources and Environments, School of Life and Environmental Sciences, The University of Sydney, Camperdown, New South W
Water Res ; 244: 120558, 2023 Oct 01.
Article en En | MEDLINE | ID: mdl-37666153
ABSTRACT
Early warning of increased algal activity is important to mitigate potential impacts on aquatic life and human health. While many methods have been developed to predict increased algal activity, an ongoing issue is that severe algal blooms often occur with low frequency in water bodies. This results in imbalanced data sets available for model specification, leading to poor predictions of the frequency of increased algal activity. One approach to address this is to resample data sets of increased algal activity to increase the prevalence of higher than normal algal activity in calibration data and ultimately improve model predictions. This study aims to investigate the use of resampling techniques to address the imbalanced dataset and determine if such methods can improve the prediction of increased algal activity. Three techniques were investigated, Kmeans under-sampling (US_Kmeans), synthetic minority over-sampling technique (SMOTE), and 'SMOTE and cluster-based under-sampling technique' (SCUT). The resampling methods were applied to a Bayesian network (BN) model of Lake Burragorang in New South Wales, Australia. The model was developed to predict chlorophyll-a (chl-a) using a range of water quality parameters as predictors. The original data and each of the balanced datasets were used for BN structures and parameter learning. The results showed that the best graphical structure was obtained by adding synthetic data from SMOTE with the highest true positive rate (TPR) and area under the curve (AUC). When compared using a fixed graphical structure for the BN, all resampling techniques increased the ability of the BN to detect events with higher probability of increased algal activity. The resampling model results can also be used to better understand the most important influences on high chl-a concentrations and suggest future data collection and model development priorities.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Eutrofización Tipo de estudio: Prognostic_studies / Risk_factors_studies Límite: Humans País/Región como asunto: Oceania Idioma: En Revista: Water Res Año: 2023 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Eutrofización Tipo de estudio: Prognostic_studies / Risk_factors_studies Límite: Humans País/Región como asunto: Oceania Idioma: En Revista: Water Res Año: 2023 Tipo del documento: Article