Tackling data challenges in forecasting effluent characteristics of wastewater treatment plants.

Roohi, Ali Mohammad; Nazif, Sara; Ramazi, Pouria

Roohi, Ali Mohammad; Nazif, Sara; Ramazi, Pouria.

Roohi AM; School of Civil Engineering, College of Engineering, University of Tehran, Tehran, Iran.
Nazif S; School of Civil Engineering, College of Engineering, University of Tehran, Tehran, Iran. Electronic address: snazif@ut.ac.ir.
Ramazi P; Department of Mathematics and Statistics, Brock University, St. Catharines, ON, L2S 3A1, Canada.

J Environ Manage ; 354: 120324, 2024 Mar.

Article en En | MEDLINE | ID: mdl-38364537

ABSTRACT

ABSTRACT

In wastewater treatment plants (WWTPs), the stochastic nature of influent wastewater and operational and weather conditions cause fluctuations in effluent quality. Data-driven models can forecast effluent quality a few hours ahead as a response to the influent characteristics, providing enough time to adjust system operations and avoid undesired consequences. However, existing data for training models are often incomplete and contain missing values. On the other hand, collecting additional data by installing new sensors is costly. The trade-off between using existing incomplete data and collecting costly new data results in three data challenges faced when developing data-driven WWTP effluent forecasters. These challenges are to determine important variables to be measured, the minimum number of required data instances, and the maximum percentage of tolerable missing values that do not impede the development of an accurate model. As these issues are not discussed in previous studies, in this research, for the first time, a comprehensive analysis is done to provide answers to these challenges. Another issue that arises in all data-driven modeling is how to select an appropriate forecasting model. This paper addresses these issues by first testing nine machine learning models on data collected from three wastewater treatment plants located in Iran, Australia, and Spain. The most accurate forecaster, Bayesian network, was then used to address the articulated challenges. Key variables in forecasting effluent characteristics were flow rate, total suspended solids, electrical conductivity, phosphorus compounds, wastewater temperature, and air temperature. A minimum of 250 samples was needed during the model training to achieve a great reduction in the forecasting error. Moreover, a steep increase in the error was observed should the portion of missing values exceed 10%. The results assist plant managers in estimating the necessary data collection effort to obtain an accurate forecaster, contributing to the quality of the effluent.

Asunto(s)

Aguas Residuales; Purificación del Agua; Teorema de Bayes; Purificación del Agua/métodos; Australia; Irán; Eliminación de Residuos Líquidos/métodos

Palabras clave

Bayesian network; Data quality; Effluent quality prediction; Missing data; Sewage treatment; WWTP

Texto completo

Imprimir

XML

PubMed Links

Search on Google

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Purificación del Agua / Aguas Residuales País como asunto: Asia / Oceania Idioma: En Año: 2024 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Search on Google