ABSTRACT
Wastewater Treatment Plants (WWTPs) present complex biochemical processes of high variability and difficult prediction. This study presents an innovative approach using Machine Learning (ML) models to predict wastewater quality parameters. In particular, the models are applied to datasets from both a simulated wastewater treatment plant (WWTP), using DHI WEST software (WEST WWTP), and a real-world WWTP database from Santa Catarina Brewery AMBEV, located in Lages/SC - Brazil (AMBEV WWTP). A distinctive aspect is the evaluation of predictive performance in continuous data scenarios and the impact of changes in WWTP operations on predictive model performance, including changes in plant layout. For both plants, three different scenarios were addressed, and the quality of predictions by random forest (RF), support vector machine (SVM), and multilayer perceptron (MLP) models were evaluated. The prediction quality by the MLP model reached an R2 of 0.72 for TN prediction in the WEST WWTP output, and the RF model better adapted to the real data of the AMBEV WWTP, despite the significant discrepancy observed between the real and the predicted data. Techniques such as Partial Dependence Plots (PDP) and Permutation Importance (PI) were used to assess the importance of features, particularly in the simulated WEST tool scenario, showing a strong correlation of prediction results with influent parameters related to nitrogen content. The results of this study highlight the importance of collecting and storing high-quality data and the need for information on changes in WWTP operation for predictive model performance. These contributions advance the understanding of predictive modeling for wastewater quality and provide valuable insights for future practice in wastewater treatment.
Subject(s)
Wastewater , Water Purification , Water Purification/methods , Machine Learning , Nitrogen/analysis , Neural Networks, Computer , Waste Disposal, Fluid/methodsABSTRACT
The objective of this study was to reveal the signs and symptoms for the classification of pediatric patients at risk of CKD using decision trees and extreme gradient boost models for predicting outcomes. A case-control study was carried out involving children with 376 chronic kidney disease (cases) and a control group of healthy children (n = 376). A family member responsible for the children answered a questionnaire with variables potentially associated with the disease. Decision tree and extreme gradient boost models were developed to test signs and symptoms for the classification of children. As a result, the decision tree model revealed 6 variables associated with CKD, whereas twelve variables that distinguish CKD from healthy children were found in the "XGBoost". The accuracy of the "XGBoost" model (ROC AUC = 0.939, 95%CI: 0.911 to 0.977) was the highest, while the decision tree model was a little lower (ROC AUC = 0.896, 95%CI: 0.850 to 0.942). The cross-validation of results showed that the accuracy of the evaluation database model was like that of the training. CONCLUSION: In conclusion, a dozen symptoms that are easy to be clinically verified emerged as risk indicators for chronic kidney disease. This information can contribute to increasing awareness of the diagnosis, mainly in primary care settings. Therefore, healthcare professionals can select patients for more detailed investigation, which will reduce the chance of wasting time and improve early disease detection. WHAT IS KNOWN: ⢠Late diagnosis of chronic kidney disease in children is common, increasing morbidity. ⢠Mass screening of the whole population is not cost-effective. WHAT IS NEW: ⢠With two machine-learning methods, this study revealed 12 symptoms to aid early CKD diagnosis. ⢠These symptoms are easily obtainable and can be useful mainly in primary care settings.
Subject(s)
Renal Insufficiency, Chronic , Humans , Child , Case-Control Studies , Renal Insufficiency, Chronic/diagnosis , Risk Factors , Early Diagnosis , Machine LearningABSTRACT
BACKGROUND: We evaluated different machine learning (ML) models for predicting soybean productivity up to 1 month in advance for the Matopiba agricultural frontier (States of Maranhão, Tocantins, Piauí, and Bahia). We collected meteorological data on the NASA-POWER platform and soybean yield on the SIDRA/IBGE base between 2008 and 2017. The ML models evaluated were random forest (RF), artificial neural networks, radial base support vector machines (SVM_RBF), linear model and polynomial regression. To assess the performance of the models, cross-validation was used, obtaining the value of precision by R2 , accuracy by root mean square error (RMSE), and trend by the mean error of the estimate (EME). RESULTS: The results showed that the RF algorithm achieves the highest precision and accuracy, with R2 of 0.81, RMSE of 176.93 kg ha-1 and trend (EME) of 1.99 kg ha-1 . On the other hand, the SVM_RBF algorithm showed the lowest performance, with R2 of 0.74, RMSE of 213.58 kg ha-1 and EME of -15.06 kg ha-1 . The average yield values predicted by the models were within the expected range for the region, which has a historical average value of 2.730 kg ha-1 . CONCLUSION: All models had acceptable precision, accuracy and trend indices, which makes it possible to use all algorithms to be applied in the prediction of soybean crop yield, observing the particularities of the region to be studied, in addition to being a useful tool for agricultural planning and decision making in soy-producing regions such as the Brazilian Cerrado. © 2021 Society of Chemical Industry.
Subject(s)
Fabaceae , Glycine max , Algorithms , Brazil , Machine Learning , Support Vector MachineABSTRACT
INTRODUCTION: After the initial wave of antibiotic discovery, few novel classes of antibiotics have emerged, with the latest dating back to the 1980's. Furthermore, the pace of antibiotic drug discovery is unable to keep up with the increasing prevalence of antibiotic drug resistance. However, the increasing amount of available data promotes the use of machine learning techniques (MLT) in drug discovery projects (e.g. construction of regression/classification models and ranking/virtual screening of compounds). AREAS COVERED: In this review, the authors cover some of the applications of MLT in medicinal chemistry, focusing on the development of new antibiotics, the prediction of resistance and its mechanisms. The aim of this review is to illustrate the main advantages and disadvantages and the major trends from studies over the past 5 years. EXPERT OPINION: The application of MLT to antibacterial drug discovery can aid the selection of new and potent lead compounds, with desirable pharmacokinetic and toxic profiles for further optimization. The increasing volume of available data along with the constant improvement in computational power and algorithms has meant that we are experiencing a transition in the way we face modern issues such as drug resistance, where our decisions are data-driven and experiments can be focused by data-suggested hypotheses.
Subject(s)
Anti-Bacterial Agents/administration & dosage , Drug Development/methods , Machine Learning , Algorithms , Animals , Anti-Bacterial Agents/adverse effects , Anti-Bacterial Agents/pharmacology , Drug Design , Drug Discovery/methods , Drug Resistance, Bacterial , HumansABSTRACT
Trichomonas vaginalis is the causative agent of trichomoniasis, a highly prevalent sexually transmitted infection worldwide. Nitroimidazole drugs, such as metronidazole and tinidazole, are the only recommended treatment, but cases of resistance represent at least 5%. In case of resistance or therapeutic failure, posology with higher doses is used, culminating in the increase of the toxic effects of the treatment. In this context, the development of new drugs becomes an eminent necessity. Hologram quantitative structure-activity relationship (HQSAR) models using nitroimidazole derivatives were generated to discover the relationship between the different chemical structures and the activity against cells and the selectivity against susceptible and resistant strains. One model of each strain was chosen for interpretation, both showed good internal coefficient (q2LOO values: 0.607 for susceptible strain and 0.646 for resistant strain subsets) and great values in other internal and external validations metrics. From the contribution of fragments to HQSAR models, several differences between the most and least potent compounds were found: 5-nitroimidazole contributes positively while 4-nitroimidazole negatively. QSAR models employing random forest (RF-QSAR) machine learning technique were also built and a robust model was obtained from resistant strain activity prediction (q2LOO equals to 0.618). The constructed HQSAR and RF-QSAR models were employed to predict the activity of three newly planned nitroimidazole derivatives in the design of new drugs candidates against T. vaginalis strains.
Subject(s)
Antiprotozoal Agents/pharmacology , Nitroimidazoles/pharmacology , Trichomonas vaginalis/drug effects , Drug Resistance/drug effects , Quantitative Structure-Activity RelationshipABSTRACT
This is a pioneering work in South America to model the exposure of cyclists to black carbon (BC) while riding in an urban area with high spatiotemporal variability of BC concentrations. We report on mobile BC concentrations sampled on 10 biking sessions in the city of Curitiba (Brazil), during rush hours of weekdays, covering four routes and totaling 178â¯km. Moreover, simultaneous BC measurements were conducted within a street canyon (street and rooftop levels) and at a site located 13â¯km from the city center. We used two statistical approaches to model the BC concentrations: multiple linear regression (MLR) and a machine-learning technique called random forests (RF). A pool of 25 candidate variables was created, including pollution measurements, traffic characteristics, street geometry and meteorology. The aggregated mean BC concentration within 30-m buffers along the four routes was 7.09⯵gâ¯m-3, with large spatial variability (5th and 95th percentiles of 1.75 and 16.83⯵gâ¯m-3, respectively). On average, the concentrations at the street canyon façade (5â¯m height) were lower than the mobile data but higher than the urban background levels. The MLR model explained a low percentage of variance (24%), but was within the values found in the literature for on-road BC mobile data. RF explained a larger variance (54%) with the additional advantage of having lower requirements for the target and predictor variables. The most impactful predictor for both models was the traffic rate of heavy-duty vehicles. Thus, to reduce the BC exposure of cyclists and residents living close to busy streets, we emphasize the importance of renewing and/or retrofitting the diesel-powered fleet, particularly public buses with old vehicle technologies. Urban planners could also use this valuable information to project bicycle lanes with greater separation from the circulation of heavy-duty diesel vehicles.