Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 115
Filter
1.
J Phys Chem A ; 128(21): 4335-4352, 2024 May 30.
Article in English | MEDLINE | ID: mdl-38752854

ABSTRACT

Obtaining accurate enthalpies of formation of chemical species, ΔHf, often requires empirical corrections that connect the results of quantum mechanical (QM) calculations with the experimental enthalpies of elements in their standard state. One approach is to use atomization energy corrections followed by bond additivity corrections (BACs), such as those defined by Petersson et al. or Anantharaman and Melius. Another approach is to utilize isodesmic reactions (IDRs) as shown by Buerger et al. We implement both approaches in Arkane, an open-source software that can calculate species thermochemistry using results from various QM software packages. In this work, we collect 421 reference species from the literature to derive ΔHf corrections and fit atomization energy corrections and BACs for 15 commonly used model chemistries. We find that both types of BACs yield similar accuracy, although Anantharaman- and Melius-type BACs appear to generalize better. Furthermore, BACs tend to achieve better accuracy than IDRs for commonly used model chemistries, and IDRs can be less robust because of the sensitivity to the chosen reference species and reactions. Overall, Anantharaman- and Melius-type BACs are our recommended approach for achieving accurate QM corrections for enthalpies.

2.
J Phys Chem A ; 128(14): 2891-2907, 2024 Apr 11.
Article in English | MEDLINE | ID: mdl-38536892

ABSTRACT

Detailed chemical kinetic models offer valuable mechanistic insights into industrial applications. Automatic generation of reliable kinetic models requires fast and accurate radical thermochemistry estimation. Kineticists often prefer hydrogen bond increment (HBI) corrections from a closed-shell molecule to the corresponding radical for their interpretability, physical meaning, and facilitation of error cancellation as a relative quantity. Tree estimators, used due to limited data, currently rely on expert knowledge and manual construction, posing challenges in maintenance and improvement. In this work, we extend the subgraph isomorphic decision tree (SIDT) algorithm originally developed for rate estimation to estimate HBI corrections. We introduce a physics-aware splitting criterion, explore a bounded weighted uncertainty estimation method, and evaluate aleatoric uncertainty-based and model variance reduction-based prepruning methods. Moreover, we compile a data set of thermochemical parameters for 2210 radicals involving C, O, N, and H based on quantum chemical calculations from recently published works. We leverage the collected data set to train the SIDT model. Compared to existing empirical tree estimators, the SIDT model (1) offers an automatic approach to generating and extending the tree estimator for thermochemistry, (2) has better accuracy and R2, (3) provides significantly more realistic uncertainty estimates, and (4) has a tree structure much more advantageous in descent speed. Overall, the SIDT estimator marks a great leap in kinetic modeling, offering more precise, reliable, and scalable predictions for radical thermochemistry.

3.
Chem Sci ; 15(7): 2410-2424, 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-38362410

ABSTRACT

Fast and accurate prediction of solvent effects on reaction rates are crucial for kinetic modeling, chemical process design, and high-throughput solvent screening. Despite the recent advance in machine learning, a scarcity of reliable data has hindered the development of predictive models that are generalizable for diverse reactions and solvents. In this work, we generate a large set of data with the COSMO-RS method for over 28 000 neutral reactions and 295 solvents and train a machine learning model to predict the solvation free energy and solvation enthalpy of activation (ΔΔG‡solv, ΔΔH‡solv) for a solution phase reaction. On unseen reactions, the model achieves mean absolute errors of 0.71 and 1.03 kcal mol-1 for ΔΔG‡solv and ΔΔH‡solv, respectively, relative to the COSMO-RS calculations. The model also provides reliable predictions of relative rate constants within a factor of 4 when tested on experimental data. The presented model can provide nearly instantaneous predictions of kinetic solvent effects or relative rate constants for a broad range of neutral closed-shell or free radical reactions and solvents only based on atom-mapped reaction SMILES and solvent SMILES strings.

4.
J Chem Inf Model ; 64(1): 9-17, 2024 Jan 08.
Article in English | MEDLINE | ID: mdl-38147829

ABSTRACT

Deep learning has become a powerful and frequently employed tool for the prediction of molecular properties, thus creating a need for open-source and versatile software solutions that can be operated by nonexperts. Among the current approaches, directed message-passing neural networks (D-MPNNs) have proven to perform well on a variety of property prediction tasks. The software package Chemprop implements the D-MPNN architecture and offers simple, easy, and fast access to machine-learned molecular properties. Compared to its initial version, we present a multitude of new Chemprop functionalities such as the support of multimolecule properties, reactions, atom/bond-level properties, and spectra. Further, we incorporate various uncertainty quantification and calibration methods along with related metrics as well as pretraining and transfer learning workflows, improved hyperparameter optimization, and other customization options concerning loss functions or atom/bond features. We benchmark D-MPNN models trained using Chemprop with the new reaction, atom-level, and spectra functionality on a variety of property prediction data sets, including MoleculeNet and SAMPL, and observe state-of-the-art performance on the prediction of water-octanol partition coefficients, reaction barrier heights, atomic partial charges, and absorption spectra. Chemprop enables out-of-the-box training of D-MPNN models for a variety of problem settings in fast, user-friendly, and open-source software.


Subject(s)
Machine Learning , Software , Neural Networks, Computer , Chemical Phenomena , Water
5.
Science ; 382(6677): eadi1407, 2023 Dec 22.
Article in English | MEDLINE | ID: mdl-38127734

ABSTRACT

A closed-loop, autonomous molecular discovery platform driven by integrated machine learning tools was developed to accelerate the design of molecules with desired properties. We demonstrated two case studies on dye-like molecules, targeting absorption wavelength, lipophilicity, and photooxidative stability. In the first study, the platform experimentally realized 294 unreported molecules across three automatic iterations of molecular design-make-test-analyze cycles while exploring the structure-function space of four rarely reported scaffolds. In each iteration, the property prediction models that guided exploration learned the structure-property space of diverse scaffold derivatives, which were realized with multistep syntheses and a variety of reactions. The second study exploited property models trained on the explored chemical space and previously reported molecules to discover nine top-performing molecules within a lightly explored structure-property space.

6.
Chem Sci ; 14(48): 14229-14242, 2023 Dec 13.
Article in English | MEDLINE | ID: mdl-38098707

ABSTRACT

Enzymatic reactions are an ecofriendly, selective, and versatile addition, sometimes even alternative to organic reactions for the synthesis of chemical compounds such as pharmaceuticals or fine chemicals. To identify suitable reactions, computational models to predict the activity of enzymes on non-native substrates, to perform retrosynthetic pathway searches, or to predict the outcomes of reactions including regio- and stereoselectivity are becoming increasingly important. However, current approaches are substantially hindered by the limited amount of available data, especially if balanced and atom mapped reactions are needed and if the models feature machine learning components. We therefore constructed a high-quality dataset (EnzymeMap) by developing a large set of correction and validation algorithms for recorded reactions in the literature and showcase its significant positive impact on machine learning models of retrosynthesis, forward prediction, and regioselectivity prediction, outperforming previous approaches by a large margin. Our dataset allows for deep learning models of enzymatic reactions with unprecedented accuracy, and is freely available online.

7.
J Phys Chem A ; 127(48): 10268-10281, 2023 Dec 07.
Article in English | MEDLINE | ID: mdl-38010212

ABSTRACT

Although charged solutes are common in many chemical systems, traditional solvation models perform poorly in calculating solvation energies of ions. One major obstacle is the scarcity of experimental data for solvated ions. In this study, we release an experiment-based aqueous ionic solvation energy data set, IonSolv-Aq, that contains hydration free energies for 118 anions and 155 cations, more than 2 times larger than the set of hydration free energies for singly charged ions contained in the 2012 Minnesota Solvation Database commonly used in benchmarking studies. We discuss sources of systematic uncertainty in the data set and use the data to examine the accuracy of popular implicit solvation models COSMO-RS and SMD for predicting solvation free energies of singly charged ionic solutes in water. Our results indicate that most SMD and COSMO-RS modeling errors for ionic solutes are systematic and correctable with empirical parameters. We discuss two systematic offsets: one across all ions and one that depends on the functional group of the ionization site. After correcting for these offsets, solvation energies of singly charged ions are predicted using COSMO-RS to 3.1 kcal mol-1 MAE against a challenging test set and 1.7 kcal mol-1 MAE (about 3% relative error) with a filtered test set. The performance of SMD is similar, with MAE against those same test sets of 2.7 and 1.7 kcal mol-1. These results underscore the importance of compiling larger experimental data sets to improve solvation model parametrization and fairly assess performance.

8.
J Phys Chem B ; 127(47): 10151-10170, 2023 Nov 30.
Article in English | MEDLINE | ID: mdl-37966798

ABSTRACT

Predicting Gibbs free energy of solution is key to understanding the solvent effects on thermodynamics and reaction rates for kinetic modeling. Accurately computing solution free energies requires the enumeration and evaluation of relevant solute conformers in solution. However, even after generation of relevant conformers, determining their free energy of solution requires an expensive workflow consisting of several ab initio computational chemistry calculations. To help address this challenge, we generate a large data set of solution free energies for nearly 44,000 solutes with almost 9 million conformers calculated in 41 different solvents using density functional theory and COSMO-RS and quantify the impact of solute conformers on the solution free energy. We then train a message passing neural network to predict the relative solution free energies of a set of solute conformers, enabling the identification of a small subset of thermodynamically relevant conformers. The model offers substantial computational time savings with predictions usually substantially within 1 kcal/mol of the free energy of the solution calculated by using computational chemical methods.

9.
J Chem Phys ; 159(14)2023 Oct 14.
Article in English | MEDLINE | ID: mdl-37811829

ABSTRACT

We study the accuracy and convergence properties of the chemically significant eigenvalues method as proposed by Georgievskii et al. [J. Phys. Chem. A 117, 12146-12154 (2013)] and its close relative, dominant subspace truncation, for reduction of the energy-grained master equation. We formally derive the connection between both reduction techniques and provide hard error bounds for the accuracy of the latter which confirm the empirically excellent accuracy and convergence properties but also unveil practically relevant cases in which both methods are bound to fall short. We propose the use of balanced truncation as an effective alternative in these cases.

10.
J Chem Inf Model ; 63(15): 4574-4588, 2023 08 14.
Article in English | MEDLINE | ID: mdl-37487557

ABSTRACT

Knowledge of critical properties, such as critical temperature, pressure, density, as well as acentric factor, is essential to calculate thermo-physical properties of chemical compounds. Experiments to determine critical properties and acentric factors are expensive and time intensive; therefore, we developed a machine learning (ML) model that can predict these molecular properties given the SMILES representation of a chemical species. We explored directed message passing neural network (D-MPNN) and graph attention network as ML architecture choices. Additionally, we investigated featurization with additional atomic and molecular features, multitask training, and pretraining using estimated data to optimize model performance. Our final model utilizes a D-MPNN layer to learn the molecular representation and is supplemented by Abraham parameters. A multitask training scheme was used to train a single model to predict all the critical properties and acentric factors along with boiling point, melting point, enthalpy of vaporization, and enthalpy of fusion. The model was evaluated on both random and scaffold splits where it shows state-of-the-art accuracies. The extensive data set of critical properties and acentric factors contains 1144 chemical compounds and is made available in the public domain together with the source code that can be used for further exploration.


Subject(s)
Machine Learning , Neural Networks, Computer , Temperature , Transition Temperature
11.
J Phys Chem A ; 127(27): 5637-5651, 2023 Jul 13.
Article in English | MEDLINE | ID: mdl-37381077

ABSTRACT

Many industrially and environmentally relevant reactions occur in the liquid phase. An accurate prediction of the rate constants is needed to analyze the intricate kinetic mechanisms of condensed phase systems. Quantum chemistry and continuum solvation models are commonly used to compute liquid phase rate constants; yet, their exact computational errors remain largely unknown, and a consistent computational workflow has not been well established. In this study, the accuracies of various quantum chemical and COSMO-RS levels of theory are assessed for the predictions of liquid phase rate constants and kinetic solvent effects. The prediction is made by first obtaining gas phase rate constants and subsequently applying solvation corrections. The calculation errors are evaluated using the experimental data of 191 rate constants that comprise 15 neutral closed-shell or free radical reactions and 49 solvents. The ωB97XD/def2-TZVP level of theory combined with the COSMO-RS method at the BP-TZVP level is shown to achieve the best performance with a mean absolute error of 0.90 in log10(kliq). Relative rate constants are additionally compared to determine the errors associated with the solvation calculations alone. Very accurate predictions of relative rate constants are achieved at nearly all levels of theory with a mean absolute error of 0.27 in log10(ksolvent1/ksolvent2).

12.
J Chem Inf Model ; 63(13): 4012-4029, 2023 07 10.
Article in English | MEDLINE | ID: mdl-37338239

ABSTRACT

Characterizing uncertainty in machine learning models has recently gained interest in the context of machine learning reliability, robustness, safety, and active learning. Here, we separate the total uncertainty into contributions from noise in the data (aleatoric) and shortcomings of the model (epistemic), further dividing epistemic uncertainty into model bias and variance contributions. We systematically address the influence of noise, model bias, and model variance in the context of chemical property predictions, where the diverse nature of target properties and the vast chemical chemical space give rise to many different distinct sources of prediction error. We demonstrate that different sources of error can each be significant in different contexts and must be individually addressed during model development. Through controlled experiments on data sets of molecular properties, we show important trends in model performance associated with the level of noise in the data set, size of the data set, model architecture, molecule representation, ensemble size, and data set splitting. In particular, we show that 1) noise in the test set can limit a model's observed performance when the actual performance is much better, 2) using size-extensive model aggregation structures is crucial for extensive property prediction, and 3) ensembling is a reliable tool for uncertainty quantification and improvement specifically for the contribution of model variance. We develop general guidelines on how to improve an underperforming model when falling into different uncertainty contexts.


Subject(s)
Machine Learning , Uncertainty , Reproducibility of Results
13.
J Phys Chem A ; 127(14): 3231-3245, 2023 Apr 13.
Article in English | MEDLINE | ID: mdl-36999979

ABSTRACT

The combustion and pyrolysis behaviors of light esters and fatty acid methyl esters have been widely studied due to their relevance as biofuel and fuel additives. However, a knowledge gap exists for midsize alkyl acetates, especially ones with long alkoxyl groups. Butyl acetate, in particular, is a promising biofuel with its economic and robust production possibilities and ability to enhance blendstock performance and reduce soot formation. However, it is little studied from both experimental and modeling aspects. This work created detailed oxidation mechanisms for the four butyl acetate isomers (normal-, sec-, tert-, and iso-butyl acetate) at temperatures varying from 650 to 2000 K and pressures up to 100 atm using the Reaction Mechanism Generator. About 60% of species in each model have thermochemical parameters from published data or in-house quantum calculations, including fuel molecules and intermediate combustion products. Kinetics of essential primary reactions, retro-ene and hydrogen atom abstraction by OH or HO2, governing the fuel oxidation pathways, were also calculated quantum-mechanically. Simulation of the developed mechanisms indicates that the majority of the fuel will decompose into acetic acid and relevant butenes at elevated temperatures, making their ignition behaviors similar to butenes. The adaptability of the developed models to high-temperature pyrolysis systems was tested against newly collected high-pressure shock experiments; the simulated CO mole fraction time histories have a reasonable agreement with the laser measurement in the shock tube. This work reveals the high-temperature oxidation chemistry of butyl acetates and demonstrates the validity of predictive models for biofuel chemistry established on accurate thermochemical and kinetic parameters.

18.
J Chem Inf Model ; 62(20): 4906-4915, 2022 10 24.
Article in English | MEDLINE | ID: mdl-36222558

ABSTRACT

The Reaction Mechanism Generator (RMG) database for chemical property prediction is presented. The RMG database consists of curated datasets and estimators for accurately predicting the parameters necessary for constructing a wide variety of chemical kinetic mechanisms. These datasets and estimators are mostly published and enable prediction of thermodynamics, kinetics, solvation effects, and transport properties. For thermochemistry prediction, the RMG database contains 45 libraries of thermochemical parameters with a combination of 4564 entries and a group additivity scheme with 9 types of corrections including radical, polycyclic, and surface absorption corrections with 1580 total curated groups and parameters for a graph convolutional neural network trained using transfer learning from a set of >130 000 DFT calculations to 10 000 high-quality values. Correction schemes for solvent-solute effects, important for thermochemistry in the liquid phase, are available. They include tabulated values for 195 pure solvents and 152 common solutes and a group additivity scheme for predicting the properties of arbitrary solutes. For kinetics estimation, the database contains 92 libraries of kinetic parameters containing a combined 21 000 reactions and contains rate rule schemes for 87 reaction classes trained on 8655 curated training reactions. Additional libraries and estimators are available for transport properties. All of this information is easily accessible through the graphical user interface at https://rmg.mit.edu. Bulk or on-the-fly use can be facilitated by interfacing directly with the RMG Python package which can be installed from Anaconda. The RMG database provides kineticists with easy access to estimates of the many parameters they need to model and analyze kinetic systems. This helps to speed up and facilitate kinetic analysis by enabling easy hypothesis testing on pathways, by providing parameters for model construction, and by providing checks on kinetic parameters from other sources.


Subject(s)
Models, Chemical , Kinetics , Thermodynamics , Databases, Factual , Solvents
19.
Faraday Discuss ; 238(0): 741-766, 2022 Oct 21.
Article in English | MEDLINE | ID: mdl-36093929

ABSTRACT

This Faraday Discussion, marking the centenary of Lindemann's explanation of the pressure-dependence of unimolecular reactions, presented recent advances in measuring and computing collisional energy transfer efficiencies, microcanonical rate coefficients, and pressure-dependent (phenomenological) rate coefficients, and the incorporation of these rate coefficients in kinetic models. Several of the presentations featured systems where breakdown of the Born-Oppenheimer approximation is key to understanding the measured rates/products. Many of the reaction systems presented were quite complex, which can make it difficult to go from "plausible proposed explanation" to "quantitative agreement between model and experiment". This complexity highlights the need for better automation of the calculations, better documentation and benchmarking to catch any errors and to make the calculations more easily reproducible, and continued (and even closer) cooperation of experimentalists and modelers. In some situations the correct definition of a "species" is debatable, since the population distributions and time evolution are so distorted from the perfect-Boltzmann Lewis-structure zero-order concept of a chemical species. Despite all these challenges, the field has made tremendous advances, and several cases were presented which demonstrated both excellent understanding of very complicated reaction chemistry and quantitatively accurate predictions of complicated experiments. Some of the interesting contributions to this Discussion are highlighted here, with some comments and suggestions for next steps.

20.
Faraday Discuss ; 238(0): 380-404, 2022 10 21.
Article in English | MEDLINE | ID: mdl-35792089

ABSTRACT

The full energy-grained master equation (ME) is too large to be conveniently used in kinetic modeling, so almost always it is replaced by a reduced model using phenomenological rate coefficients. The accuracy of several methods for obtaining these pressure-dependent phenomenological rate coefficients, and so for constructing a reduced model, is tested against direct numerical solutions of the full ME, and the deviations are sometimes quite large. An algebraic expression for the error between the popular chemically-significant eigenvalue (CSE) method and the exact ME solution is derived. An alternative way to compute phenomenological rate coefficients, simulation least-squares (SLS), is presented. SLS is often about as accurate as CSE, and sometimes has significant advantages over CSE. One particular variant of SLS, using the matrix exponential, is as fast as CSE, and seems to be more robust. However, all of the existing methods for constructing reduced models to approximate the ME, including CSE and SLS, are inaccurate under some conditions, and sometimes they fail dramatically due to numerical problems. The challenge of constructing useful reduced models that more reliably emulate the full ME solution is discussed.


Subject(s)
Models, Theoretical , Kinetics , Computer Simulation , Least-Squares Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...