RESUMO
In this study, we introduce the count-based Morgan fingerprint (C-MF) to represent chemical structures of contaminants and develop machine learning (ML)-based predictive models for their activities and properties. Compared with the binary Morgan fingerprint (B-MF), C-MF not only qualifies the presence or absence of an atom group but also quantifies its counts in a molecule. We employ six different ML algorithms (ridge regression, SVM, KNN, RF, XGBoost, and CatBoost) to develop models on 10 contaminant-related data sets based on C-MF and B-MF to compare them in terms of the model's predictive performance, interpretation, and applicability domain (AD). Our results show that C-MF outperforms B-MF in nine of 10 data sets in terms of model predictive performance. The advantage of C-MF over B-MF is dependent on the ML algorithm, and the performance enhancements are proportional to the difference in the chemical diversity of data sets calculated by B-MF and C-MF. Model interpretation results show that the C-MF-based model can elucidate the effect of atom group counts on the target and have a wider range of SHAP values. AD analysis shows that C-MF-based models have an AD similar to that of B-MF-based ones. Finally, we developed a "ContaminaNET" platform to deploy these C-MF-based models for free use.
Assuntos
Algoritmos , Aprendizado de Máquina , Poluição Química da Água , Água/química , Poluição Química da Água/análiseRESUMO
Ultrafiltration (UF) as one of the mainstream membrane-based technologies has been widely used in water and wastewater treatment. Increasing demand for clean and safe water requires the rational design of UF membranes with antifouling potential, while maintaining high water permeability and removal efficiency. This work employed a machine learning (ML) method to establish and understand the correlation of five membrane performance indices as well as three major performance-determining membrane properties with membrane fabrication conditions. The loading of additives, specifically nanomaterials (A_wt %), at loading amounts of >1.0 wt % was found to be the most significant feature affecting all of the membrane performance indices. The polymer content (P_wt %), molecular weight of the pore maker (M_Da), and pore maker content (M_wt %) also made considerable contributions to predicting membrane performance. Notably, M_Da was more important than M_wt % for predicting membrane performance. The feature analysis of ML models in terms of membrane properties (i.e., mean pore size, overall porosity, and contact angle) provided an unequivocal explanation of the effects of fabrication conditions on membrane performance. Our approach can provide practical aid in guiding the design of fit-for-purpose separation membranes through data-driven virtual experiments.
Assuntos
Nanoestruturas , Ultrafiltração , Ultrafiltração/métodos , Membranas Artificiais , Polímeros , ÁguaRESUMO
Iron-associated reductants play a crucial role in providing electrons for various reductive transformations. However, developing reliable predictive tools for estimating abiotic reduction rate constants (logk) in such systems has been impeded by the intricate nature of these systems. Our recent study developed a machine learning (ML) model based on 60 organic compounds toward one soluble Fe(II)-reductant. In this study, we built a comprehensive kinetic data set covering the reactivity of 117 organic and 10 inorganic compounds toward four major types of Fe(II)-associated reductants. Separate ML models were developed for organic and inorganic compounds, and the feature importance analysis demonstrated the significance of resonance structures, reducible functional groups, reductant descriptors, and pH in logk prediction. Mechanistic interpretation validated that the models accurately learned the impact of various factors such as aromatic substituents, complexation, bond dissociation energy, reduction potential, LUMO energy, and dominant reductant species. Finally, we found that 38% of the 850,000 compounds in the Distributed Structure-Searchable Toxicity (DSSTox) database contain at least one reducible functional group, and the logk of 285,184 compounds could be reasonably predicted using our model. Overall, the study is a significant step toward reliable predictive tools for anticipating abiotic reduction rate constants in iron-associated reductant systems.
Assuntos
Ferro , Substâncias Redutoras , Substâncias Redutoras/química , Oxirredução , Ferro/química , Compostos Orgânicos , Compostos Ferrosos/químicaRESUMO
Microalgal biotechnology holds the potential for renewable biofuels, bioproducts, and carbon capture applications due to unparalleled photosynthetic efficiency and diversity. Outdoor open raceway pond (ORP) cultivation enables utilization of sunlight and atmospheric carbon dioxide to drive microalgal biomass synthesis for production of bioproducts including biofuels; however, environmental conditions are highly dynamic and fluctuate both diurnally and seasonally, making ORP productivity prediction challenging without time-intensive physical measurements and location-specific calibrations. Here, for the first time, we present an image-based deep learning method for the prediction of ORP productivity. Our method is based on parameter profile plot images of sensor parameters, including pH, dissolved oxygen, temperature, photosynthetically active radiation, and total dissolved solids. These parameters can be remotely monitored without physical interaction with ORPs. We apply the model to data we generated during the Unified Field Studies of the Algae Testbed Public-Private-Partnership (ATP3 UFS), the largest publicly available ORP data set to date, which includes millions of sensor records and 598 productivities from 32 ORPs operated in 5 states in the United States. We demonstrate that this approach significantly outperforms an average value based traditional machine learning method (R2 = 0.77 â« R2 = 0.39) without considering bioprocess parameters (e.g., biomass density, hydraulic retention time, and nutrient concentrations). We then evaluate the sensitivity of image and monitoring data resolutions and input parameter variations. Our results demonstrate ORP productivity can be effectively predicted from remote monitoring data, providing an inexpensive tool for microalgal production and operational forecasting.
Assuntos
Aprendizado Profundo , Microalgas , Lagoas , Biocombustíveis , Luz Solar , BiomassaRESUMO
To develop predictive models for the reactivity of organic contaminants toward four oxidantsâSO4â¢-, HClO, O3, and ClO2âall with small sample sizes, we proposed two approaches: combining small data sets and transferring knowledge between them. We first merged these data sets and developed a unified model using machine learning (ML), which showed better predictive performance than the individual models for HClO (RMSEtest: 2.1 to 2.04), O3 (2.06 to 1.94), ClO2 (1.77 to 1.49), and SO4â¢- (0.75 to 0.70) because the model "corrected" the wrongly learned effects of several atom groups. We further developed knowledge transfer models for three pairs of the data sets and observed different predictive performances: improved for O3 (RMSEtest: 2.06 to 2.01)/HClO (2.10 to 1.98), mixed for O3 (2.06 to 2.01)/ClO2 (1.77 to 1.95), and unchanged for ClO2 (1.77 to 1.77)/HClO (2.1 to 2.1). The effectiveness of the latter approach depended on whether there was consistent knowledge shared between the data sets and on the performance of the individual models. We also compared our approaches with multitask learning and image-based transfer learning and found that our approaches consistently improved the predictive performance for all data sets while the other two did not. This study demonstrated the effectiveness of combining small, similar data sets and transferring knowledge between them to improve ML model performance.
Assuntos
Oxidantes , Ozônio , Aprendizado de Máquina , Relação Quantitativa Estrutura-AtividadeRESUMO
Polymeric membrane design is a multidimensional process involving selection of membrane materials and optimization of fabrication conditions from an infinite candidate space. It is impossible to explore the entire space by trial-and-error experimentation. Here, we present a membrane design strategy utilizing machine learning-based Bayesian optimization to precisely identify the optimal combinations of unexplored monomers and their fabrication conditions from an infinite space. We developed ML models to accurately predict water permeability and salt rejection from membrane monomer types (represented by the Morgan fingerprint) and fabrication conditions. We applied Bayesian optimization on the built ML model to inversely identify sets of monomer/fabrication condition combinations with the potential to break the upper bound for water/salt selectivity and permeability. We fabricated eight membranes under the identified combinations and found that they exceeded the present upper bound. Our findings demonstrate that ML-based Bayesian optimization represents a paradigm shift for next-generation separation membrane design.
Assuntos
Aprendizado de Máquina , Membranas Artificiais , Teorema de Bayes , Permeabilidade , ÁguaRESUMO
The rapid increase in both the quantity and complexity of data that are being generated daily in the field of environmental science and engineering (ESE) demands accompanied advancement in data analytics. Advanced data analysis approaches, such as machine learning (ML), have become indispensable tools for revealing hidden patterns or deducing correlations for which conventional analytical methods face limitations or challenges. However, ML concepts and practices have not been widely utilized by researchers in ESE. This feature explores the potential of ML to revolutionize data analysis and modeling in the ESE field, and covers the essential knowledge needed for such applications. First, we use five examples to illustrate how ML addresses complex ESE problems. We then summarize four major types of applications of ML in ESE: making predictions; extracting feature importance; detecting anomalies; and discovering new materials or chemicals. Next, we introduce the essential knowledge required and current shortcomings in ML applications in ESE, with a focus on three important but often overlooked components when applying ML: correct model development, proper model interpretation, and sound applicability analysis. Finally, we discuss challenges and future opportunities in the application of ML tools in ESE to highlight the potential of ML in this field.
Assuntos
Ciência Ambiental , Aprendizado de MáquinaRESUMO
Predictive models are useful tools for aqueous adsorption research; existing models such as multilinear regression (MLR), however, can only predict adsorption under specific equilibrium concentrations or for certain adsorption isotherm models. Also, few studies have discussed data processing beyond applying different modeling algorithms to improve the prediction accuracy. In this research, we employed a cosine similarity approach that focused on mining the available data before developing models; this approach can mine the most relevant data concerning the prediction target to build models and was found to considerably improve the prediction accuracy. We then built a machine-learning modeling process based on neural networks (NN), a group-selection data-splitting strategy for grouped adsorption data for adsorbent-adsorbate pairs under different equilibrium concentrations, and polyparameter linear free energy relationships (pp-LFERs) for aqueous adsorption of 165 organic compounds onto 50 biochars, 34 carbon nanotubes, 35 GACs, and 30 polymeric resins. The final NN-LFER models were successfully applied to various equilibrium concentrations regardless of the adsorption isotherm models and showed less prediction deviations than the published models with the root-mean-square errors 0.23-0.31 versus 0.23-0.97 log unit, and the predictions were improved by adding two key descriptors (BET surface area and pore volume) for the adsorbents. Finally, interpreting the NN-LFER models based on the Shapley values suggested that not considering equilibrium concentration and properties of the adsorbents in the existing MLR models is a possible reason for their higher prediction deviations.
Assuntos
Carvão Vegetal , Nanotubos de Carbono , Adsorção , Aprendizado de MáquinaRESUMO
Titanium dioxide (TiO2) is a well-known photocatalyst in the applications of water contaminant treatment. Traditionally, the kinetics of photo-degradation rates are obtained from experiments, which consumes enormous labor and experimental investments. Here, a generalized predictive model was developed for prediction of the photo-degradation rate constants of organic contaminants in the presence of TiO2 nanoparticles and ultraviolet irradiation in aqueous solution. This model combines an artificial neural network (ANN) with a variety of factors that affect the photo-degradation performance, i.e., ultraviolet intensity, TiO2 dosage, organic contaminant type and initial concentration in water, and initial pH of the solution. The molecular fingerprints (MF) were used to interpret the organic contaminants as binary vectors, a format that is machine-readable in computational linguistics. A dataset of 446 data points for training and testing was collected from the literature. This predictive model shows a good accuracy with a root mean square error (RMSE) of 0.173.
Assuntos
Poluentes Químicos da Água , Água , Catálise , Cinética , Redes Neurais de Computação , TitânioRESUMO
Manganese dioxides (MnO2) are among important environmental oxidants in contaminant removal; however, most existing work has only focused on naturally abundant MnO2. We herein report the effects of different phase structures of synthetic MnO2 on their oxidative activity with regard to contaminant degradation. Bisphenol A (BPA), a frequently detected contaminant in the environment, was used as a probe compound. A total of eight MnO2 with five different phase structures (α-, ß-, γ-, δ-, and λ-MnO2) were successfully synthesized with different methods. The oxidative reactivity of MnO2, as quantified by pseudo-first-order rate constants of BPA oxidation, followed the order of δ-MnO2-1 > δ-MnO2-2 > α-MnO2-1 > α-MnO2-2 ≈ γ-MnO2 > λ-MnO2 > ß-MnO2-2 > ß-MnO2-1. Extensive characterization was then conducted for MnO2 crystal structure, morphology, surface area, reduction potential, conductivity, and surface Mn oxidation states and oxygen species. The results showed that the MnO2 oxidative reactivity correlated highly positively with surface Mn(III) content and negatively with surface Mn average oxidation state but correlated poorly with all other properties. This indicates that surface Mn(III) played an important role in MnO2 oxidative reactivity. For the same MnO2 phase structure synthesized by different methods, higher surface area, reduction potential, conductivity, or surface adsorbed oxygen led to higher reactivity, suggesting that these properties play a secondary role in the reactivity. These findings provide general guidance for designing active MnO2 for cost-effective water and wastewater treatment.
Assuntos
Compostos Benzidrílicos , Compostos de Manganês , Oxirredução , Estresse Oxidativo , Óxidos , FenóisAssuntos
Nanotubos de Carbono , Purificação da Água , Adsorção , Carvão Vegetal , Aprendizado de MáquinaRESUMO
Polyvinyl chloride (PVC) membrane-based ion-selective electrode (ISE) sensors are common tools for water assessments, but their development relies on time-consuming and costly experimental investigations. To address this challenge, this study combines machine learning (ML), Morgan fingerprint, and Bayesian optimization technologies with experimental results to develop high-performance PVC-based ISE cation sensors. By using 1745 data sets collected from 20 years of literature, appropriate ML models are trained to enable accurate prediction and a deep understanding of the relationship between ISE components and sensor performance (R 2 = 0.75). Rapid ionophore screening is achieved using the Morgan fingerprint based on atomic groups derived from ML model interpretation. Bayesian optimization is then applied to identify optimal combinations of ISE materials with the potential to deliver desirable ISE sensor performance. Na+, Mg2+, and Al3+ sensors fabricated from Bayesian optimization results exhibit excellent Nernst slopes with less than 8.2% deviation from the ideal value and superb detection limits at 10-7 M level based on experimental validation results. This approach can potentially transform sensor development into a more time-efficient, cost-effective, and rational design process, guided by ML-based techniques.
RESUMO
The selective transformation of organics from wastewater to value-added chemicals is considered an upcycling process beneficial for carbon neutrality. Herein, we present an innovative electrocatalytic oxidation (ECO) system aimed at achieving the selective conversion of phenols in wastewater to para-benzoquinone (p-BQ), a valuable chemical widely utilized in the manufacturing and chemical industries. Notably, 96.4% of phenol abatement and 78.9% of p-BQ yield are synchronously obtained over a preferred carbon cloth-supported ruthenium nanoparticles (Ru/C) anode. Such unprecedented results stem from the weak Ru-O bond between the Ru active sites and generated p-BQ, which facilitates the desorption of p-BQ from the anode surface. This property not only prevents the excessive oxidation of the generated p-BQ but also reinstates the Ru active sites essential for the rapid ECO of phenol. Furthermore, this ECO system operates at ambient conditions and obviates the need for potent chemical oxidants, establishing a sustainable avenue for p-BQ production. Importantly, the system efficacy can be adaptable in actual phenol-containing coking wastewater, highlighting its potential practical application prospect. As a proof of concept, we construct an electrified Ru/C membrane for ECO of phenol, attaining phenol removal of 95.8% coupled with p-BQ selectivity of 73.1%, which demonstrates the feasibility of the ECO system in a scalable flow-through operation mode. This work provides a promising ECO strategy for realizing both phenols removal and valuable organics recovery from phenolic wastewater.
Assuntos
Benzoquinonas , Águas Residuárias , Poluentes Químicos da Água , Fenol/química , Fenóis , Carbono , Poluentes Químicos da Água/químicaRESUMO
Organic contaminants can be removed from water/wastewater by oxidative degradation using oxidants such as manganese oxides and/or aqueous manganese ions. The Mn species show a wide range of activity, which is related to the oxidation state of Mn. Here, we use ab initio molecular dynamics simulations to address Mn oxidation states in these systems. We first develop a correlation between Mn partial atomic charge and the oxidation state based on results of 31 simulations on known Mn aqueous complexes. The results collapse to a master curve; the dependence of partial atomic charge on oxidation state weakens with increasing oxidation state, which concurs with a previously proposed feedback effect. This correlation is then used to address oxidation states in Mn systems used as oxidants. Simulations of MnO2 polymorphs immersed in water give average oxidation states (AOS) in excellent agreement with experimental results, in that ß-MnO2 has the highest AOS, α-MnO2 has an intermediate AOS, and δ-MnO2 has the lowest AOS. Furthermore, the oxidation state varies substantially with the atom's environment, and these structures include Mn(III) and Mn(V) species that are expected to be active. In regard to the MnO4-/HSO3-/O2 system that has been shown to be a highly effective oxidant, we propose a novel Mn complex that could give rise to the oxidative activity, where Mn(III) is stabilized by sulfite and dissolved O2 ligands. Our simulations also show that the O2 would be activated to O22- in this complex under acidic conditions, and could lead to the formation of OH radicals that serve as oxidants.
Assuntos
Compostos de Manganês , Óxidos , Manganês , Oxirredução , Estresse OxidativoRESUMO
Due to the increasing diversity of organic contaminants discharged into anoxic water environments, reactivity prediction is necessary for chemical persistence evaluation for water treatment and risk assessment purposes. Almost all quantitative structure activity relationships (QSARs) that describe rates of contaminant transformation apply only to narrowly-defined, relatively homogenous families of reactants (e.g., dechlorination of alkyl halides). In this work, we develop predictive models for abiotic reduction of 60 organic compounds with diverse reducible functional groups, including nitroaromatic compounds (NACs), aliphatic nitro-compounds (ANCs), aromatic N-oxides (ANOs), isoxazoles (ISXs), polyhalogenated alkanes (PHAs), sulfoxides and sulfones (SOs), and others. Rate constants for their reduction were measured using a model reductant system, Fe(II)-tiron. Qualitatively, the rates followed the order NACs > ANOs ≈ ISXs ≈ PHAs > ANCs > SOs. To develop QSARs, both conventional chemical descriptor-based and machine learning (ML)-based approaches were investigated. Conventional univariate QSARs based on a molecular descriptor ELUMO (energy of the lowest-unoccupied molecular orbital) gave good correlations within classes. Multivariate QSARs combining ELUMO with Abraham descriptors for physico-chemical properties gave slightly improved correlations within classes for NCs and NACs, but little improvement in correlation within other classes or among classes. The ML model obtained covers reduction rates for all classes of compounds and all of the conditions studied with the prediction accuracy similar to those of the conventional QSARs for individual classes (r2 = 0.41-0.98 for univariate QSARs, 0.71-0.94 for multivariate QSARs, and 0.83 for the ML model). Both approaches required a scheme for a priori classification of the compounds for model training. This work offers two alternative modeling approaches to comprehensive abiotic reactivity prediction for persistence evaluation of organic compounds in anoxic water environments.
Assuntos
Compostos Orgânicos , Relação Quantitativa Estrutura-Atividade , Compostos Ferrosos , Humanos , Aprendizado de Máquina , ÁguaRESUMO
Ligands can significantly increase the oxidation rates of phenolic compounds by MnO4-. This was often explained by the in situ formed Mn(III)- or Mn(X)-ligand complexes that can oxidize phenols faster than MnO4- can. This work discovered that Mn(III)-ligand complexes also acted as a catalyst for the oxidation of phenolic compounds by MnO4- (i.e., the catalytic role of Mn(III)-ligand). First, when phenol was mixed with MnO4- and pyrophosphate (PP, a representative ligand), Mn(III)-PP was found to form while phenol was quickly oxidized. However, the amount of phenol that was directly oxidized by Mn(III)-PP only accounted for â¼25% of phenol that was oxidized in the mixture, indicating that there were other pathways. Then, when pentachlorophenol (PCP) was used as another phenolic probe, the externally prepared Mn(III)-PP was observed to only slightly oxidize PCP, but its addition significantly accelerated PCP oxidation by MnO4-. The Mn(III)-PP concentration also remained unchanged during the above reaction, thus suggesting the catalyst role of Mn(III)-PP. This new pathway was further validated by successfully explaining all the experimental observations obtained so far, including the effect of pH, effects of different ligand amounts and types, product patterns, and the induction period. Finally, possible catalytic mechanisms of Mn(III)-ligand were discussed based on the experimental results.
RESUMO
This work combined a Deep Neural Network (DNN) with molecular fingerprints (MF) to develop models to predict the OH radical rate constants of 593 organic contaminants. Molecular descriptors, most often used in establishing quantitative structural-activity relationships (QSARs), were not used here because of their complicated generation processes that rely on advanced physicochemical and computational knowledge. Instead, we only fed the most basic information of the contaminant structures, i.e., MF encoding the types of atoms and how they are connected, to DNN and DNN then developed predictive models automatically. Here, a dataset containing 457 contaminants and their OH rate constants was first used to develop predictive models by DNN-MF. The hence developed models showed comparable accuracy to the traditional QSARs. The root mean square error (RMSE) values of the test sets were 0.358-0.384. The length of 2048 bits for the MF and 3 hidden layers (each with 1024 neurons) were found to be the optimal parameters for DNN. The model containing additional 89 micorpollutants in the training set was then successfully applied to predict the OH rate constants of 17 organophosphorus flame retardants and 29 additional micropollutants, with comparable accuracy to the reported molecular descriptors-based QSARs.
RESUMO
A recently discovered bisulfite(HSO3-)/permanganate(MnO4-) system was reported to produce highly reactive free Mn(III) that can oxidize organic compounds in milliseconds. However, this characteristic reactivity was not found in all other known reaction systems that can also produce free Mn(III). Why can Mn(III) in NaHSO3/KMnO4 be so active? Here, we found NaHSO3 and O2 acted as catalysts for the reaction between Mn(III) and organic compounds. Without O2, 0% of organic compounds were oxidized in NaHSO3/KMnO4, indicating the absence of O2 inactivated Mn(III) reactivity. When the reaction between NaHSO3 and KMnO4 was monitored in air, Mn(III) catalyzed rapid oxidation of NaHSO3 by O2. Then, the Mn(III) that could oxidize organic compounds was found to be the ones involved in the catalytic reaction between NaHSO3 and O2, thus the link between O2 and Mn(III) reactivity was established. Finally, NaHSO3/O2 can be viewed as catalysts for the reaction between Mn(III) and organic compounds because 1) when Mn(III) was involved in oxidizing organic compounds, it stopped being the catalyst for the reaction between NaHSO3 and O2 so that they were consumed to a much smaller extent; and 2) without NaHSO3 and O2, Mn(III) lost its oxidation ability. To the best of our knowledge, this is the first report on "catalytic role exchange" where Mn(III) is the catalyst for NaHSO3/O2 reaction while NaHSO3/O2 are the catalysts for Mn(III)/organic compounds reaction. Understanding the critical role of oxygen in NaHSO3/KMnO4 will enable us to apply this technology more efficiently toward contaminant removal.