Pesquisa | BVS Aleitamento Materno

1.

When Do Quantum Mechanical Descriptors Help Graph Neural Networks to Predict Chemical Properties?

Li, Shih-Cheng; Wu, Haoyang; Menon, Angiras; Spiekermann, Kevin A; Li, Yi-Pei; Green, William H.

J Am Chem Soc ; 146(33): 23103-23120, 2024 Aug 21.

Artigo em Inglês | MEDLINE | ID: mdl-39106041

RESUMO

Deep graph neural networks are extensively utilized to predict chemical reactivity and molecular properties. However, because of the complexity of chemical space, such models often have difficulty extrapolating beyond the chemistry contained in the training set. Augmenting the model with quantum mechanical (QM) descriptors is anticipated to improve its generalizability. However, obtaining QM descriptors often requires CPU-intensive computational chemistry calculations. To identify when QM descriptors help graph neural networks predict chemical properties, we conduct a systematic investigation of the impact of atom, bond, and molecular QM descriptors on the performance of directed message passing neural networks (D-MPNNs) for predicting 16 molecular properties. The analysis surveys computational and experimental targets, as well as classification and regression tasks, and varied data set sizes from several hundred to hundreds of thousands of data points. Our results indicate that QM descriptors are mostly beneficial for D-MPNN performance on small data sets, provided that the descriptors correlate well with the targets and can be readily computed with high accuracy. Otherwise, using QM descriptors can add cost without benefit or even introduce unwanted noise that can degrade model performance. Strategic integration of QM descriptors with D-MPNN unlocks potential for physics-informed, data-efficient modeling with some interpretability that can streamline de novo drug and material designs. To facilitate the use of QM descriptors in machine learning workflows for chemistry, we provide a set of guidelines regarding when and how to best leverage QM descriptors, a high-throughput workflow to compute them, and an enhancement to Chemprop, a widely adopted open-source D-MPNN implementation for chemical property prediction.

2.

Chemprop: A Machine Learning Package for Chemical Property Prediction.

Heid, Esther; Greenman, Kevin P; Chung, Yunsie; Li, Shih-Cheng; Graff, David E; Vermeire, Florence H; Wu, Haoyang; Green, William H; McGill, Charles J.

J Chem Inf Model ; 64(1): 9-17, 2024 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-38147829

RESUMO

Deep learning has become a powerful and frequently employed tool for the prediction of molecular properties, thus creating a need for open-source and versatile software solutions that can be operated by nonexperts. Among the current approaches, directed message-passing neural networks (D-MPNNs) have proven to perform well on a variety of property prediction tasks. The software package Chemprop implements the D-MPNN architecture and offers simple, easy, and fast access to machine-learned molecular properties. Compared to its initial version, we present a multitude of new Chemprop functionalities such as the support of multimolecule properties, reactions, atom/bond-level properties, and spectra. Further, we incorporate various uncertainty quantification and calibration methods along with related metrics as well as pretraining and transfer learning workflows, improved hyperparameter optimization, and other customization options concerning loss functions or atom/bond features. We benchmark D-MPNN models trained using Chemprop with the new reaction, atom-level, and spectra functionality on a variety of property prediction data sets, including MoleculeNet and SAMPL, and observe state-of-the-art performance on the prediction of water-octanol partition coefficients, reaction barrier heights, atomic partial charges, and absorption spectra. Chemprop enables out-of-the-box training of D-MPNN models for a variety of problem settings in fast, user-friendly, and open-source software.

Assuntos

Aprendizado de Máquina , Software , Redes Neurais de Computação , Fenômenos Químicos , Água

3.

Subgraph Isomorphic Decision Tree to Predict Radical Thermochemistry with Bounded Uncertainty Estimation.

Pang, Hao-Wei; Dong, Xiaorui; Johnson, Matthew S; Green, William H.

J Phys Chem A ; 128(14): 2891-2907, 2024 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-38536892

RESUMO

Detailed chemical kinetic models offer valuable mechanistic insights into industrial applications. Automatic generation of reliable kinetic models requires fast and accurate radical thermochemistry estimation. Kineticists often prefer hydrogen bond increment (HBI) corrections from a closed-shell molecule to the corresponding radical for their interpretability, physical meaning, and facilitation of error cancellation as a relative quantity. Tree estimators, used due to limited data, currently rely on expert knowledge and manual construction, posing challenges in maintenance and improvement. In this work, we extend the subgraph isomorphic decision tree (SIDT) algorithm originally developed for rate estimation to estimate HBI corrections. We introduce a physics-aware splitting criterion, explore a bounded weighted uncertainty estimation method, and evaluate aleatoric uncertainty-based and model variance reduction-based prepruning methods. Moreover, we compile a data set of thermochemical parameters for 2210 radicals involving C, O, N, and H based on quantum chemical calculations from recently published works. We leverage the collected data set to train the SIDT model. Compared to existing empirical tree estimators, the SIDT model (1) offers an automatic approach to generating and extending the tree estimator for thermochemistry, (2) has better accuracy and R2, (3) provides significantly more realistic uncertainty estimates, and (4) has a tree structure much more advantageous in descent speed. Overall, the SIDT estimator marks a great leap in kinetic modeling, offering more precise, reliable, and scalable predictions for radical thermochemistry.

4.

Accurately Predicting Barrier Heights for Radical Reactions in Solution Using Deep Graph Networks.

Spiekermann, Kevin A; Dong, Xiaorui; Menon, Angiras; Green, William H; Pfeifle, Mark; Sandfort, Frederik; Welz, Oliver; Bergeler, Maike.

J Phys Chem A ; 2024 Sep 19.

Artigo em Inglês | MEDLINE | ID: mdl-39298746

RESUMO

Quantitative estimates of reaction barriers and solvent effects are essential for developing kinetic mechanisms and predicting reaction outcomes. Here, we create a new data set of 5,600 unique elementary radical reactions calculated using the M06-2X/def2-QZVP//B3LYP-D3(BJ)/def2-TZVP level of theory. A conformer search is done for each species using TPSS/def2-TZVP. Gibbs free energies of activation and of reaction for these radical reactions in 40 common solvents are obtained using COSMO-RS for solvation effects. These balanced reactions involve the elements H, C, N, O, and S, contain up to 19 heavy atoms, and have atom-mapped SMILES. All transition states are verified by an intrinsic reaction coordinate calculation. We next train a deep graph network to directly estimate the Gibbs free energy of activation and of reaction in both gas and solution phases using only the atom-mapped SMILES of the reactant and product and the SMILES of the solvent. This simple input representation avoids computationally expensive optimizations for the reactant, transition state, and product structures during inference, making our model well-suited for high-throughput predictive chemistry and quickly providing information for (retro-)synthesis planning tools. To properly measure model performance, we report results on both interpolative and extrapolative data splits and also compare to several baseline models. During training and testing, the data set is augmented by including the reverse direction of each reaction and variants with different resonance structures. After data augmentation, we have around 2 million entries to train the model, which achieves a testing set mean absolute error of 1.16 kcal mol-1 for the Gibbs free energy of activation in solution. We anticipate this model will accelerate predictions for high-throughput screening to quickly identify relevant reactions in solution, and our data set will serve as a benchmark for future studies.

5.

Toward Accurate Quantum Mechanical Thermochemistry: (1) Extensible Implementation and Comparison of Bond Additivity Corrections and Isodesmic Reactions.

Wu, Haoyang; Payne, A Mark; Pang, Hao-Wei; Menon, Angiras; Grambow, Colin A; Ranasinghe, Duminda S; Dong, Xiaorui; Grinberg Dana, Alon; Green, William H.

J Phys Chem A ; 128(21): 4335-4352, 2024 May 30.

Artigo em Inglês | MEDLINE | ID: mdl-38752854

RESUMO

Obtaining accurate enthalpies of formation of chemical species, ΔHf, often requires empirical corrections that connect the results of quantum mechanical (QM) calculations with the experimental enthalpies of elements in their standard state. One approach is to use atomization energy corrections followed by bond additivity corrections (BACs), such as those defined by Petersson et al. or Anantharaman and Melius. Another approach is to utilize isodesmic reactions (IDRs) as shown by Buerger et al. We implement both approaches in Arkane, an open-source software that can calculate species thermochemistry using results from various QM software packages. In this work, we collect 421 reference species from the literature to derive ΔHf corrections and fit atomization energy corrections and BACs for 15 commonly used model chemistries. We find that both types of BACs yield similar accuracy, although Anantharaman- and Melius-type BACs appear to generalize better. Furthermore, BACs tend to achieve better accuracy than IDRs for commonly used model chemistries, and IDRs can be less robust because of the sensitivity to the chosen reference species and reactions. Overall, Anantharaman- and Melius-type BACs are our recommended approach for achieving accurate QM corrections for enthalpies.

6.

Predicting Critical Properties and Acentric Factors of Fluids Using Multitask Machine Learning.

Biswas, Sayandeep; Chung, Yunsie; Ramirez, Josephine; Wu, Haoyang; Green, William H.

J Chem Inf Model ; 63(15): 4574-4588, 2023 08 14.

Artigo em Inglês | MEDLINE | ID: mdl-37487557

RESUMO

Knowledge of critical properties, such as critical temperature, pressure, density, as well as acentric factor, is essential to calculate thermo-physical properties of chemical compounds. Experiments to determine critical properties and acentric factors are expensive and time intensive; therefore, we developed a machine learning (ML) model that can predict these molecular properties given the SMILES representation of a chemical species. We explored directed message passing neural network (D-MPNN) and graph attention network as ML architecture choices. Additionally, we investigated featurization with additional atomic and molecular features, multitask training, and pretraining using estimated data to optimize model performance. Our final model utilizes a D-MPNN layer to learn the molecular representation and is supplemented by Abraham parameters. A multitask training scheme was used to train a single model to predict all the critical properties and acentric factors along with boiling point, melting point, enthalpy of vaporization, and enthalpy of fusion. The model was evaluated on both random and scaffold splits where it shows state-of-the-art accuracies. The extensive data set of critical properties and acentric factors contains 1144 chemical compounds and is made available in the public domain together with the source code that can be used for further exploration.

Assuntos

Aprendizado de Máquina , Redes Neurais de Computação , Temperatura , Temperatura de Transição

7.

Characterizing Uncertainty in Machine Learning for Chemistry.

Heid, Esther; McGill, Charles J; Vermeire, Florence H; Green, William H.

J Chem Inf Model ; 63(13): 4012-4029, 2023 07 10.

Artigo em Inglês | MEDLINE | ID: mdl-37338239

RESUMO

Characterizing uncertainty in machine learning models has recently gained interest in the context of machine learning reliability, robustness, safety, and active learning. Here, we separate the total uncertainty into contributions from noise in the data (aleatoric) and shortcomings of the model (epistemic), further dividing epistemic uncertainty into model bias and variance contributions. We systematically address the influence of noise, model bias, and model variance in the context of chemical property predictions, where the diverse nature of target properties and the vast chemical chemical space give rise to many different distinct sources of prediction error. We demonstrate that different sources of error can each be significant in different contexts and must be individually addressed during model development. Through controlled experiments on data sets of molecular properties, we show important trends in model performance associated with the level of noise in the data set, size of the data set, model architecture, molecule representation, ensemble size, and data set splitting. In particular, we show that 1) noise in the test set can limit a model's observed performance when the actual performance is much better, 2) using size-extensive model aggregation structures is crucial for extensive property prediction, and 3) ensembling is a reliable tool for uncertainty quantification and improvement specifically for the contribution of model variance. We develop general guidelines on how to improve an underperforming model when falling into different uncertainty contexts.

Assuntos

Aprendizado de Máquina , Incerteza , Reprodutibilidade dos Testes

8.

Computing Kinetic Solvent Effects and Liquid Phase Rate Constants Using Quantum Chemistry and COSMO-RS Methods.

Chung, Yunsie; Green, William H.

J Phys Chem A ; 127(27): 5637-5651, 2023 Jul 13.

Artigo em Inglês | MEDLINE | ID: mdl-37381077

RESUMO

Many industrially and environmentally relevant reactions occur in the liquid phase. An accurate prediction of the rate constants is needed to analyze the intricate kinetic mechanisms of condensed phase systems. Quantum chemistry and continuum solvation models are commonly used to compute liquid phase rate constants; yet, their exact computational errors remain largely unknown, and a consistent computational workflow has not been well established. In this study, the accuracies of various quantum chemical and COSMO-RS levels of theory are assessed for the predictions of liquid phase rate constants and kinetic solvent effects. The prediction is made by first obtaining gas phase rate constants and subsequently applying solvation corrections. The calculation errors are evaluated using the experimental data of 191 rate constants that comprise 15 neutral closed-shell or free radical reactions and 49 solvents. The ωB97XD/def2-TZVP level of theory combined with the COSMO-RS method at the BP-TZVP level is shown to achieve the best performance with a mean absolute error of 0.90 in log10(kliq). Relative rate constants are additionally compared to determine the errors associated with the solvation calculations alone. Very accurate predictions of relative rate constants are achieved at nearly all levels of theory with a mean absolute error of 0.27 in log10(ksolvent1/ksolvent2).

9.

Experimental Compilation and Computation of Hydration Free Energies for Ionic Solutes.

Zheng, Jonathan W; Green, William H.

J Phys Chem A ; 127(48): 10268-10281, 2023 Dec 07.

Artigo em Inglês | MEDLINE | ID: mdl-38010212

RESUMO

Although charged solutes are common in many chemical systems, traditional solvation models perform poorly in calculating solvation energies of ions. One major obstacle is the scarcity of experimental data for solvated ions. In this study, we release an experiment-based aqueous ionic solvation energy data set, IonSolv-Aq, that contains hydration free energies for 118 anions and 155 cations, more than 2 times larger than the set of hydration free energies for singly charged ions contained in the 2012 Minnesota Solvation Database commonly used in benchmarking studies. We discuss sources of systematic uncertainty in the data set and use the data to examine the accuracy of popular implicit solvation models COSMO-RS and SMD for predicting solvation free energies of singly charged ionic solutes in water. Our results indicate that most SMD and COSMO-RS modeling errors for ionic solutes are systematic and correctable with empirical parameters. We discuss two systematic offsets: one across all ions and one that depends on the functional group of the ionization site. After correcting for these offsets, solvation energies of singly charged ions are predicted using COSMO-RS to 3.1 kcal mol-1 MAE against a challenging test set and 1.7 kcal mol-1 MAE (about 3% relative error) with a filtered test set. The performance of SMD is similar, with MAE against those same test sets of 2.7 and 1.7 kcal mol-1. These results underscore the importance of compiling larger experimental data sets to improve solvation model parametrization and fairly assess performance.

10.

Butyl Acetate Pyrolysis and Combustion Chemistry: Mechanism Generation and Shock Tube Experiments.

Dong, Xiaorui; Pio, Gianmaria; Arafin, Farhan; Laich, Andrew; Baker, Jessica; Ninnemann, Erik; Vasu, Subith S; Green, William H.

J Phys Chem A ; 127(14): 3231-3245, 2023 Apr 13.

Artigo em Inglês | MEDLINE | ID: mdl-36999979

RESUMO

The combustion and pyrolysis behaviors of light esters and fatty acid methyl esters have been widely studied due to their relevance as biofuel and fuel additives. However, a knowledge gap exists for midsize alkyl acetates, especially ones with long alkoxyl groups. Butyl acetate, in particular, is a promising biofuel with its economic and robust production possibilities and ability to enhance blendstock performance and reduce soot formation. However, it is little studied from both experimental and modeling aspects. This work created detailed oxidation mechanisms for the four butyl acetate isomers (normal-, sec-, tert-, and iso-butyl acetate) at temperatures varying from 650 to 2000 K and pressures up to 100 atm using the Reaction Mechanism Generator. About 60% of species in each model have thermochemical parameters from published data or in-house quantum calculations, including fuel molecules and intermediate combustion products. Kinetics of essential primary reactions, retro-ene and hydrogen atom abstraction by OH or HO2, governing the fuel oxidation pathways, were also calculated quantum-mechanically. Simulation of the developed mechanisms indicates that the majority of the fuel will decompose into acetic acid and relevant butenes at elevated temperatures, making their ignition behaviors similar to butenes. The adaptability of the developed models to high-temperature pyrolysis systems was tested against newly collected high-pressure shock experiments; the simulated CO mole fraction time histories have a reasonable agreement with the laser measurement in the shock tube. This work reveals the high-temperature oxidation chemistry of butyl acetates and demonstrates the validity of predictive models for biofuel chemistry established on accurate thermochemical and kinetic parameters.

11.

On the accuracy of the chemically significant eigenvalue method.

Holtorf, Flemming; Green, William H.

J Chem Phys ; 159(14)2023 Oct 14.

Artigo em Inglês | MEDLINE | ID: mdl-37811829

RESUMO

We study the accuracy and convergence properties of the chemically significant eigenvalues method as proposed by Georgievskii et al. [J. Phys. Chem. A 117, 12146-12154 (2013)] and its close relative, dominant subspace truncation, for reduction of the energy-grained master equation. We formally derive the connection between both reduction techniques and provide hard error bounds for the accuracy of the latter which confirm the empirically excellent accuracy and convergence properties but also unveil practically relevant cases in which both methods are bound to fall short. We propose the use of balanced truncation as an effective alternative in these cases.

12.

Predicting Solubility Limits of Organic Solutes for a Wide Range of Solvents and Temperatures.

Vermeire, Florence H; Chung, Yunsie; Green, William H.

J Am Chem Soc ; 144(24): 10785-10797, 2022 06 22.

Artigo em Inglês | MEDLINE | ID: mdl-35687887

RESUMO

The solubility of organic molecules is crucial in organic synthesis and industrial chemistry; it is important in the design of many phase separation and purification units, and it controls the migration of many species into the environment. To decide which solvents and temperatures can be used in the design of new processes, trial and error is often used, as the choice is restricted by unknown solid solubility limits. Here, we present a fast and convenient computational method for estimating the solubility of solid neutral organic molecules in water and many organic solvents for a broad range of temperatures. The model is developed by combining fundamental thermodynamic equations with machine learning models for solvation free energy, solvation enthalpy, Abraham solute parameters, and aqueous solid solubility at 298 K. We provide free open-source and online tools for the prediction of solid solubility limits and a curated data collection (SolProp) that includes more than 5000 experimental solid solubility values for validation of the model. The model predictions are accurate for aqueous systems and for a huge range of organic solvents up to 550 K or higher. Methods to further improve solid solubility predictions by providing experimental data on the solute of interest in another solvent, or on the solute's sublimation enthalpy, are also presented.

Assuntos

Água , Coleta de Dados , Solubilidade , Soluções , Solventes/química , Temperatura , Termodinâmica , Água/química

13.

Kinetic Modeling of API Oxidation: (2) Imipramine Stress Testing.

Wu, Haoyang; Grinberg Dana, Alon; Ranasinghe, Duminda S; Pickard, Frank C; Wood, Geoffrey P F; Zelesky, Todd; Sluggett, Gregory W; Mustakis, Jason; Green, William H.

Mol Pharm ; 19(5): 1526-1539, 2022 05 02.

Artigo em Inglês | MEDLINE | ID: mdl-35435696

RESUMO

Gauging the chemical stability of active pharmaceutical ingredients (APIs) is critical at various stages of pharmaceutical development to identify potential risks from drug degradation and ensure the quality and safety of the drug product. Stress testing has been the major experimental method to study API stability, but this analytical approach is time-consuming, resource-intensive, and limited by API availability, especially during the early stages of drug development. Novel computational chemistry methods may assist in screening for API chemical stability prior to synthesis and augment contemporary API stress testing studies, with the potential to significantly accelerate drug development and reduce costs. In this work, we leverage quantum chemical calculations and automated reaction mechanism generation to provide new insights into API degradation studies. In the continuation of part one in this series of studies [Grinberg Dana et al., Mol. Pharm. 2021 18 (8), 3037-3049], we have generated the first ab initio predictive chemical kinetic model of free-radical oxidative degradation for API stress testing. We focused on imipramine oxidation in an azobis(isobutyronitrile) (AIBN)/H2O/CH3OH solution and compared the model's predictions with concurrent experimental observations. We analytically determined iminodibenzyl and desimipramine as imipramine's two major degradation products under industry-standard AIBN stress testing conditions, and our ab initio kinetic model successfully identified both of them in its prediction for the top three degradation products. This work shows the potential and utility of predictive chemical kinetic modeling and quantum chemical computations to elucidate API chemical stability issues. Further, we envision an automated digital workflow that integrates first-principle models with data-driven methods that, when actively and iteratively combined with high-throughput experiments, can substantially accelerate and transform future API chemical stability studies.

Assuntos

Imipramina , Modelos Químicos , Estabilidade de Medicamentos , Radicais Livres , Cinética , Oxirredução

14.

Concluding remarks: Faraday Discussion on unimolecular reactions.

Green, William H.

Faraday Discuss ; 238(0): 741-766, 2022 Oct 21.

Artigo em Inglês | MEDLINE | ID: mdl-36093929

RESUMO

This Faraday Discussion, marking the centenary of Lindemann's explanation of the pressure-dependence of unimolecular reactions, presented recent advances in measuring and computing collisional energy transfer efficiencies, microcanonical rate coefficients, and pressure-dependent (phenomenological) rate coefficients, and the incorporation of these rate coefficients in kinetic models. Several of the presentations featured systems where breakdown of the Born-Oppenheimer approximation is key to understanding the measured rates/products. Many of the reaction systems presented were quite complex, which can make it difficult to go from "plausible proposed explanation" to "quantitative agreement between model and experiment". This complexity highlights the need for better automation of the calculations, better documentation and benchmarking to catch any errors and to make the calculations more easily reproducible, and continued (and even closer) cooperation of experimentalists and modelers. In some situations the correct definition of a "species" is debatable, since the population distributions and time evolution are so distorted from the perfect-Boltzmann Lewis-structure zero-order concept of a chemical species. Despite all these challenges, the field has made tremendous advances, and several cases were presented which demonstrated both excellent understanding of very complicated reaction chemistry and quantitatively accurate predictions of complicated experiments. Some of the interesting contributions to this Discussion are highlighted here, with some comments and suggestions for next steps.

15.

Examining the accuracy of methods for obtaining pressure dependent rate coefficients.

Johnson, Matthew S; Green, William H.

Faraday Discuss ; 238(0): 380-404, 2022 10 21.

Artigo em Inglês | MEDLINE | ID: mdl-35792089

RESUMO

The full energy-grained master equation (ME) is too large to be conveniently used in kinetic modeling, so almost always it is replaced by a reduced model using phenomenological rate coefficients. The accuracy of several methods for obtaining these pressure-dependent phenomenological rate coefficients, and so for constructing a reduced model, is tested against direct numerical solutions of the full ME, and the deviations are sometimes quite large. An algebraic expression for the error between the popular chemically-significant eigenvalue (CSE) method and the exact ME solution is derived. An alternative way to compute phenomenological rate coefficients, simulation least-squares (SLS), is presented. SLS is often about as accurate as CSE, and sometimes has significant advantages over CSE. One particular variant of SLS, using the matrix exponential, is as fast as CSE, and seems to be more robust. However, all of the existing methods for constructing reduced models to approximate the ME, including CSE and SLS, are inaccurate under some conditions, and sometimes they fail dramatically due to numerical problems. The challenge of constructing useful reduced models that more reliably emulate the full ME solution is discussed.

Assuntos

Modelos Teóricos , Cinética , Simulação por Computador , Análise dos Mínimos Quadrados

16.

Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction.

Heid, Esther; Green, William H.

J Chem Inf Model ; 62(9): 2101-2110, 2022 05 09.

Artigo em Inglês | MEDLINE | ID: mdl-34734699

RESUMO

The estimation of chemical reaction properties such as activation energies, rates, or yields is a central topic of computational chemistry. In contrast to molecular properties, where machine learning approaches such as graph convolutional neural networks (GCNNs) have excelled for a wide variety of tasks, no general and transferable adaptations of GCNNs for reactions have been developed yet. We therefore combined a popular cheminformatics reaction representation, the so-called condensed graph of reaction (CGR), with a recent GCNN architecture to arrive at a versatile, robust, and compact deep learning model. The CGR is a superposition of the reactant and product graphs of a chemical reaction and thus an ideal input for graph-based machine learning approaches. The model learns to create a data-driven, task-dependent reaction embedding that does not rely on expert knowledge, similar to current molecular GCNNs. Our approach outperforms current state-of-the-art models in accuracy, is applicable even to imbalanced reactions, and possesses excellent predictive capabilities for diverse target properties, such as activation energies, reaction enthalpies, rate constants, yields, or reaction classes. We furthermore curated a large set of atom-mapped reactions along with their target properties, which can serve as benchmark data sets for future work. All data sets and the developed reaction GCNN model are available online, free of charge, and open source.

Assuntos

Aprendizado de Máquina , Redes Neurais de Computação , Quimioinformática

17.

Influence of Template Size, Canonicalization, and Exclusivity for Retrosynthesis and Reaction Prediction Applications.

Heid, Esther; Liu, Jiannan; Aude, Andrea; Green, William H.

J Chem Inf Model ; 62(1): 16-26, 2022 01 10.

Artigo em Inglês | MEDLINE | ID: mdl-34939786

RESUMO

Heuristic and machine learning models for rank-ordering reaction templates comprise an important basis for computer-aided organic synthesis regarding both product prediction and retrosynthetic pathway planning. Their viability relies heavily on the quality and characteristics of the underlying template database. With the advent of automated reaction and template extraction software and consequently the creation of template databases too large for manual curation, a data-driven approach to assess and improve the quality of template sets is needed. We therefore systematically studied the influence of template generality, canonicalization, and exclusivity on the performance of different template ranking models. We find that duplicate and nonexclusive templates, i.e., templates which describe the same chemical transformation on identical or overlapping sets of molecules, decrease both the accuracy of the ranking algorithm and the applicability of the respective top-ranked templates significantly. To remedy the negative effects of nonexclusivity, we developed a general and computationally efficient framework to deduplicate and hierarchically correct templates. As a result, performance improved considerably for both heuristic and machine learning template ranking models, as well as multistep retrosynthetic planning models. The canonicalization and correction code is made freely available.

Assuntos

Algoritmos , Software , Computadores , Heurística , Aprendizado de Máquina

18.

Group Contribution and Machine Learning Approaches to Predict Abraham Solute Parameters, Solvation Free Energy, and Solvation Enthalpy.

Chung, Yunsie; Vermeire, Florence H; Wu, Haoyang; Walker, Pierre J; Abraham, Michael H; Green, William H.

J Chem Inf Model ; 62(3): 433-446, 2022 02 14.

Artigo em Inglês | MEDLINE | ID: mdl-35044781

RESUMO

We present a group contribution method (SoluteGC) and a machine learning model (SoluteML) to predict the Abraham solute parameters, as well as a machine learning model (DirectML) to predict solvation free energy and enthalpy at 298 K. The proposed group contribution method uses atom-centered functional groups with corrections for ring and polycyclic strain while the machine learning models adopt a directed message passing neural network. The solute parameters predicted from SoluteGC and SoluteML are used to calculate solvation energy and enthalpy via linear free energy relationships. Extensive data sets containing 8366 solute parameters, 20,253 solvation free energies, and 6322 solvation enthalpies are compiled in this work to train the models. The three models are each evaluated on the same test sets using both random and substructure-based solute splits for solvation energy and enthalpy predictions. The results show that the DirectML model is superior to the SoluteML and SoluteGC models for both predictions and can provide accuracy comparable to that of advanced quantum chemistry methods. Yet, even though the DirectML model performs better in general, all three models are useful for various purposes. Uncertain predicted values can be identified by comparing the three models, and when the 3 models are combined together, they can provide even more accurate predictions than any one of them individually. Finally, we present our compiled solute parameter, solvation energy, and solvation enthalpy databases (SoluteDB, dGsolvDBx, dHsolvDB) and provide public access to our final prediction models through a simple web-based tool, software packages, and source code.

Assuntos

Aprendizado de Máquina , Redes Neurais de Computação , Entropia , Soluções , Solventes , Termodinâmica

19.

RMG Database for Chemical Property Prediction.

Johnson, Matthew S; Dong, Xiaorui; Grinberg Dana, Alon; Chung, Yunsie; Farina, David; Gillis, Ryan J; Liu, Mengjie; Yee, Nathan W; Blondal, Katrin; Mazeau, Emily; Grambow, Colin A; Payne, A Mark; Spiekermann, Kevin A; Pang, Hao-Wei; Goldsmith, C Franklin; West, Richard H; Green, William H.

J Chem Inf Model ; 62(20): 4906-4915, 2022 10 24.

Artigo em Inglês | MEDLINE | ID: mdl-36222558

RESUMO

The Reaction Mechanism Generator (RMG) database for chemical property prediction is presented. The RMG database consists of curated datasets and estimators for accurately predicting the parameters necessary for constructing a wide variety of chemical kinetic mechanisms. These datasets and estimators are mostly published and enable prediction of thermodynamics, kinetics, solvation effects, and transport properties. For thermochemistry prediction, the RMG database contains 45 libraries of thermochemical parameters with a combination of 4564 entries and a group additivity scheme with 9 types of corrections including radical, polycyclic, and surface absorption corrections with 1580 total curated groups and parameters for a graph convolutional neural network trained using transfer learning from a set of >130 000 DFT calculations to 10 000 high-quality values. Correction schemes for solvent-solute effects, important for thermochemistry in the liquid phase, are available. They include tabulated values for 195 pure solvents and 152 common solutes and a group additivity scheme for predicting the properties of arbitrary solutes. For kinetics estimation, the database contains 92 libraries of kinetic parameters containing a combined 21â¯000 reactions and contains rate rule schemes for 87 reaction classes trained on 8655 curated training reactions. Additional libraries and estimators are available for transport properties. All of this information is easily accessible through the graphical user interface at https://rmg.mit.edu. Bulk or on-the-fly use can be facilitated by interfacing directly with the RMG Python package which can be installed from Anaconda. The RMG database provides kineticists with easy access to estimates of the many parameters they need to model and analyze kinetic systems. This helps to speed up and facilitate kinetic analysis by enabling easy hypothesis testing on pathways, by providing parameters for model construction, and by providing checks on kinetic parameters from other sources.

Assuntos

Modelos Químicos , Cinética , Termodinâmica , Bases de Dados Factuais , Solventes

20.

An Integrated Assessment of Emissions, Air Quality, and Public Health Impacts of China's Transition to Electric Vehicles.

Hsieh, I-Yun Lisa; Chossière, Guillaume P; Gençer, Emre; Chen, Hao; Barrett, Steven; Green, William H.

Environ Sci Technol ; 2022 Feb 16.

Artigo em Inglês | MEDLINE | ID: mdl-35171556

RESUMO

Electric vehicles (EVs) are a promising pathway to providing cleaner personal mobility. China provides substantial supports to increase EV market share. This study provides an extensive analysis of the currently unclear environmental and health benefits of these incentives at the provincial level. EVs in China have modest cradle-to-gate CO2 benefits (on average 29%) compared to conventional internal combustion engine vehicles (ICEVs), but have similar carbon emissions relative to hybrid electric vehicles. Well-to-wheel air pollutant emissions assessment shows that emissions associated with ICEVs are mainly from gasoline production, not the tailpipe, suggesting tighter emissions controls on refineries are needed to combat air pollution problems effectively. By integrating a vehicle fleet model into policy scenario analysis, we quantify the policy impacts associated with the passenger vehicles in the major Chinese provinces: broader EV penetration, especially combined with cleaner power generation, could deliver greater air quality and health benefits, but not necessarily significant climate change mitigation. The total value to society of the climate and mortality benefits in 2030 is found to be comparable to a prior estimate of the EV policy's economic costs.

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA