Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 115
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Chem Inf Model ; 64(1): 9-17, 2024 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-38147829

RESUMEN

Deep learning has become a powerful and frequently employed tool for the prediction of molecular properties, thus creating a need for open-source and versatile software solutions that can be operated by nonexperts. Among the current approaches, directed message-passing neural networks (D-MPNNs) have proven to perform well on a variety of property prediction tasks. The software package Chemprop implements the D-MPNN architecture and offers simple, easy, and fast access to machine-learned molecular properties. Compared to its initial version, we present a multitude of new Chemprop functionalities such as the support of multimolecule properties, reactions, atom/bond-level properties, and spectra. Further, we incorporate various uncertainty quantification and calibration methods along with related metrics as well as pretraining and transfer learning workflows, improved hyperparameter optimization, and other customization options concerning loss functions or atom/bond features. We benchmark D-MPNN models trained using Chemprop with the new reaction, atom-level, and spectra functionality on a variety of property prediction data sets, including MoleculeNet and SAMPL, and observe state-of-the-art performance on the prediction of water-octanol partition coefficients, reaction barrier heights, atomic partial charges, and absorption spectra. Chemprop enables out-of-the-box training of D-MPNN models for a variety of problem settings in fast, user-friendly, and open-source software.


Asunto(s)
Aprendizaje Automático , Programas Informáticos , Redes Neurales de la Computación , Fenómenos Químicos , Agua
2.
J Phys Chem A ; 128(14): 2891-2907, 2024 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-38536892

RESUMEN

Detailed chemical kinetic models offer valuable mechanistic insights into industrial applications. Automatic generation of reliable kinetic models requires fast and accurate radical thermochemistry estimation. Kineticists often prefer hydrogen bond increment (HBI) corrections from a closed-shell molecule to the corresponding radical for their interpretability, physical meaning, and facilitation of error cancellation as a relative quantity. Tree estimators, used due to limited data, currently rely on expert knowledge and manual construction, posing challenges in maintenance and improvement. In this work, we extend the subgraph isomorphic decision tree (SIDT) algorithm originally developed for rate estimation to estimate HBI corrections. We introduce a physics-aware splitting criterion, explore a bounded weighted uncertainty estimation method, and evaluate aleatoric uncertainty-based and model variance reduction-based prepruning methods. Moreover, we compile a data set of thermochemical parameters for 2210 radicals involving C, O, N, and H based on quantum chemical calculations from recently published works. We leverage the collected data set to train the SIDT model. Compared to existing empirical tree estimators, the SIDT model (1) offers an automatic approach to generating and extending the tree estimator for thermochemistry, (2) has better accuracy and R2, (3) provides significantly more realistic uncertainty estimates, and (4) has a tree structure much more advantageous in descent speed. Overall, the SIDT estimator marks a great leap in kinetic modeling, offering more precise, reliable, and scalable predictions for radical thermochemistry.

3.
J Phys Chem A ; 128(21): 4335-4352, 2024 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-38752854

RESUMEN

Obtaining accurate enthalpies of formation of chemical species, ΔHf, often requires empirical corrections that connect the results of quantum mechanical (QM) calculations with the experimental enthalpies of elements in their standard state. One approach is to use atomization energy corrections followed by bond additivity corrections (BACs), such as those defined by Petersson et al. or Anantharaman and Melius. Another approach is to utilize isodesmic reactions (IDRs) as shown by Buerger et al. We implement both approaches in Arkane, an open-source software that can calculate species thermochemistry using results from various QM software packages. In this work, we collect 421 reference species from the literature to derive ΔHf corrections and fit atomization energy corrections and BACs for 15 commonly used model chemistries. We find that both types of BACs yield similar accuracy, although Anantharaman- and Melius-type BACs appear to generalize better. Furthermore, BACs tend to achieve better accuracy than IDRs for commonly used model chemistries, and IDRs can be less robust because of the sensitivity to the chosen reference species and reactions. Overall, Anantharaman- and Melius-type BACs are our recommended approach for achieving accurate QM corrections for enthalpies.

4.
J Chem Inf Model ; 63(15): 4574-4588, 2023 08 14.
Artículo en Inglés | MEDLINE | ID: mdl-37487557

RESUMEN

Knowledge of critical properties, such as critical temperature, pressure, density, as well as acentric factor, is essential to calculate thermo-physical properties of chemical compounds. Experiments to determine critical properties and acentric factors are expensive and time intensive; therefore, we developed a machine learning (ML) model that can predict these molecular properties given the SMILES representation of a chemical species. We explored directed message passing neural network (D-MPNN) and graph attention network as ML architecture choices. Additionally, we investigated featurization with additional atomic and molecular features, multitask training, and pretraining using estimated data to optimize model performance. Our final model utilizes a D-MPNN layer to learn the molecular representation and is supplemented by Abraham parameters. A multitask training scheme was used to train a single model to predict all the critical properties and acentric factors along with boiling point, melting point, enthalpy of vaporization, and enthalpy of fusion. The model was evaluated on both random and scaffold splits where it shows state-of-the-art accuracies. The extensive data set of critical properties and acentric factors contains 1144 chemical compounds and is made available in the public domain together with the source code that can be used for further exploration.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Temperatura , Temperatura de Transición
5.
J Chem Inf Model ; 63(13): 4012-4029, 2023 07 10.
Artículo en Inglés | MEDLINE | ID: mdl-37338239

RESUMEN

Characterizing uncertainty in machine learning models has recently gained interest in the context of machine learning reliability, robustness, safety, and active learning. Here, we separate the total uncertainty into contributions from noise in the data (aleatoric) and shortcomings of the model (epistemic), further dividing epistemic uncertainty into model bias and variance contributions. We systematically address the influence of noise, model bias, and model variance in the context of chemical property predictions, where the diverse nature of target properties and the vast chemical chemical space give rise to many different distinct sources of prediction error. We demonstrate that different sources of error can each be significant in different contexts and must be individually addressed during model development. Through controlled experiments on data sets of molecular properties, we show important trends in model performance associated with the level of noise in the data set, size of the data set, model architecture, molecule representation, ensemble size, and data set splitting. In particular, we show that 1) noise in the test set can limit a model's observed performance when the actual performance is much better, 2) using size-extensive model aggregation structures is crucial for extensive property prediction, and 3) ensembling is a reliable tool for uncertainty quantification and improvement specifically for the contribution of model variance. We develop general guidelines on how to improve an underperforming model when falling into different uncertainty contexts.


Asunto(s)
Aprendizaje Automático , Incertidumbre , Reproducibilidad de los Resultados
6.
J Phys Chem A ; 127(27): 5637-5651, 2023 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-37381077

RESUMEN

Many industrially and environmentally relevant reactions occur in the liquid phase. An accurate prediction of the rate constants is needed to analyze the intricate kinetic mechanisms of condensed phase systems. Quantum chemistry and continuum solvation models are commonly used to compute liquid phase rate constants; yet, their exact computational errors remain largely unknown, and a consistent computational workflow has not been well established. In this study, the accuracies of various quantum chemical and COSMO-RS levels of theory are assessed for the predictions of liquid phase rate constants and kinetic solvent effects. The prediction is made by first obtaining gas phase rate constants and subsequently applying solvation corrections. The calculation errors are evaluated using the experimental data of 191 rate constants that comprise 15 neutral closed-shell or free radical reactions and 49 solvents. The ωB97XD/def2-TZVP level of theory combined with the COSMO-RS method at the BP-TZVP level is shown to achieve the best performance with a mean absolute error of 0.90 in log10(kliq). Relative rate constants are additionally compared to determine the errors associated with the solvation calculations alone. Very accurate predictions of relative rate constants are achieved at nearly all levels of theory with a mean absolute error of 0.27 in log10(ksolvent1/ksolvent2).

7.
J Phys Chem A ; 127(48): 10268-10281, 2023 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-38010212

RESUMEN

Although charged solutes are common in many chemical systems, traditional solvation models perform poorly in calculating solvation energies of ions. One major obstacle is the scarcity of experimental data for solvated ions. In this study, we release an experiment-based aqueous ionic solvation energy data set, IonSolv-Aq, that contains hydration free energies for 118 anions and 155 cations, more than 2 times larger than the set of hydration free energies for singly charged ions contained in the 2012 Minnesota Solvation Database commonly used in benchmarking studies. We discuss sources of systematic uncertainty in the data set and use the data to examine the accuracy of popular implicit solvation models COSMO-RS and SMD for predicting solvation free energies of singly charged ionic solutes in water. Our results indicate that most SMD and COSMO-RS modeling errors for ionic solutes are systematic and correctable with empirical parameters. We discuss two systematic offsets: one across all ions and one that depends on the functional group of the ionization site. After correcting for these offsets, solvation energies of singly charged ions are predicted using COSMO-RS to 3.1 kcal mol-1 MAE against a challenging test set and 1.7 kcal mol-1 MAE (about 3% relative error) with a filtered test set. The performance of SMD is similar, with MAE against those same test sets of 2.7 and 1.7 kcal mol-1. These results underscore the importance of compiling larger experimental data sets to improve solvation model parametrization and fairly assess performance.

8.
J Phys Chem A ; 127(14): 3231-3245, 2023 Apr 13.
Artículo en Inglés | MEDLINE | ID: mdl-36999979

RESUMEN

The combustion and pyrolysis behaviors of light esters and fatty acid methyl esters have been widely studied due to their relevance as biofuel and fuel additives. However, a knowledge gap exists for midsize alkyl acetates, especially ones with long alkoxyl groups. Butyl acetate, in particular, is a promising biofuel with its economic and robust production possibilities and ability to enhance blendstock performance and reduce soot formation. However, it is little studied from both experimental and modeling aspects. This work created detailed oxidation mechanisms for the four butyl acetate isomers (normal-, sec-, tert-, and iso-butyl acetate) at temperatures varying from 650 to 2000 K and pressures up to 100 atm using the Reaction Mechanism Generator. About 60% of species in each model have thermochemical parameters from published data or in-house quantum calculations, including fuel molecules and intermediate combustion products. Kinetics of essential primary reactions, retro-ene and hydrogen atom abstraction by OH or HO2, governing the fuel oxidation pathways, were also calculated quantum-mechanically. Simulation of the developed mechanisms indicates that the majority of the fuel will decompose into acetic acid and relevant butenes at elevated temperatures, making their ignition behaviors similar to butenes. The adaptability of the developed models to high-temperature pyrolysis systems was tested against newly collected high-pressure shock experiments; the simulated CO mole fraction time histories have a reasonable agreement with the laser measurement in the shock tube. This work reveals the high-temperature oxidation chemistry of butyl acetates and demonstrates the validity of predictive models for biofuel chemistry established on accurate thermochemical and kinetic parameters.

9.
J Chem Phys ; 159(14)2023 Oct 14.
Artículo en Inglés | MEDLINE | ID: mdl-37811829

RESUMEN

We study the accuracy and convergence properties of the chemically significant eigenvalues method as proposed by Georgievskii et al. [J. Phys. Chem. A 117, 12146-12154 (2013)] and its close relative, dominant subspace truncation, for reduction of the energy-grained master equation. We formally derive the connection between both reduction techniques and provide hard error bounds for the accuracy of the latter which confirm the empirically excellent accuracy and convergence properties but also unveil practically relevant cases in which both methods are bound to fall short. We propose the use of balanced truncation as an effective alternative in these cases.

10.
J Am Chem Soc ; 144(24): 10785-10797, 2022 06 22.
Artículo en Inglés | MEDLINE | ID: mdl-35687887

RESUMEN

The solubility of organic molecules is crucial in organic synthesis and industrial chemistry; it is important in the design of many phase separation and purification units, and it controls the migration of many species into the environment. To decide which solvents and temperatures can be used in the design of new processes, trial and error is often used, as the choice is restricted by unknown solid solubility limits. Here, we present a fast and convenient computational method for estimating the solubility of solid neutral organic molecules in water and many organic solvents for a broad range of temperatures. The model is developed by combining fundamental thermodynamic equations with machine learning models for solvation free energy, solvation enthalpy, Abraham solute parameters, and aqueous solid solubility at 298 K. We provide free open-source and online tools for the prediction of solid solubility limits and a curated data collection (SolProp) that includes more than 5000 experimental solid solubility values for validation of the model. The model predictions are accurate for aqueous systems and for a huge range of organic solvents up to 550 K or higher. Methods to further improve solid solubility predictions by providing experimental data on the solute of interest in another solvent, or on the solute's sublimation enthalpy, are also presented.


Asunto(s)
Agua , Recolección de Datos , Solubilidad , Soluciones , Solventes/química , Temperatura , Termodinámica , Agua/química
11.
Mol Pharm ; 19(5): 1526-1539, 2022 05 02.
Artículo en Inglés | MEDLINE | ID: mdl-35435696

RESUMEN

Gauging the chemical stability of active pharmaceutical ingredients (APIs) is critical at various stages of pharmaceutical development to identify potential risks from drug degradation and ensure the quality and safety of the drug product. Stress testing has been the major experimental method to study API stability, but this analytical approach is time-consuming, resource-intensive, and limited by API availability, especially during the early stages of drug development. Novel computational chemistry methods may assist in screening for API chemical stability prior to synthesis and augment contemporary API stress testing studies, with the potential to significantly accelerate drug development and reduce costs. In this work, we leverage quantum chemical calculations and automated reaction mechanism generation to provide new insights into API degradation studies. In the continuation of part one in this series of studies [Grinberg Dana et al., Mol. Pharm. 2021 18 (8), 3037-3049], we have generated the first ab initio predictive chemical kinetic model of free-radical oxidative degradation for API stress testing. We focused on imipramine oxidation in an azobis(isobutyronitrile) (AIBN)/H2O/CH3OH solution and compared the model's predictions with concurrent experimental observations. We analytically determined iminodibenzyl and desimipramine as imipramine's two major degradation products under industry-standard AIBN stress testing conditions, and our ab initio kinetic model successfully identified both of them in its prediction for the top three degradation products. This work shows the potential and utility of predictive chemical kinetic modeling and quantum chemical computations to elucidate API chemical stability issues. Further, we envision an automated digital workflow that integrates first-principle models with data-driven methods that, when actively and iteratively combined with high-throughput experiments, can substantially accelerate and transform future API chemical stability studies.


Asunto(s)
Imipramina , Modelos Químicos , Estabilidad de Medicamentos , Radicales Libres , Cinética , Oxidación-Reducción
12.
Faraday Discuss ; 238(0): 741-766, 2022 Oct 21.
Artículo en Inglés | MEDLINE | ID: mdl-36093929

RESUMEN

This Faraday Discussion, marking the centenary of Lindemann's explanation of the pressure-dependence of unimolecular reactions, presented recent advances in measuring and computing collisional energy transfer efficiencies, microcanonical rate coefficients, and pressure-dependent (phenomenological) rate coefficients, and the incorporation of these rate coefficients in kinetic models. Several of the presentations featured systems where breakdown of the Born-Oppenheimer approximation is key to understanding the measured rates/products. Many of the reaction systems presented were quite complex, which can make it difficult to go from "plausible proposed explanation" to "quantitative agreement between model and experiment". This complexity highlights the need for better automation of the calculations, better documentation and benchmarking to catch any errors and to make the calculations more easily reproducible, and continued (and even closer) cooperation of experimentalists and modelers. In some situations the correct definition of a "species" is debatable, since the population distributions and time evolution are so distorted from the perfect-Boltzmann Lewis-structure zero-order concept of a chemical species. Despite all these challenges, the field has made tremendous advances, and several cases were presented which demonstrated both excellent understanding of very complicated reaction chemistry and quantitatively accurate predictions of complicated experiments. Some of the interesting contributions to this Discussion are highlighted here, with some comments and suggestions for next steps.

13.
Faraday Discuss ; 238(0): 380-404, 2022 10 21.
Artículo en Inglés | MEDLINE | ID: mdl-35792089

RESUMEN

The full energy-grained master equation (ME) is too large to be conveniently used in kinetic modeling, so almost always it is replaced by a reduced model using phenomenological rate coefficients. The accuracy of several methods for obtaining these pressure-dependent phenomenological rate coefficients, and so for constructing a reduced model, is tested against direct numerical solutions of the full ME, and the deviations are sometimes quite large. An algebraic expression for the error between the popular chemically-significant eigenvalue (CSE) method and the exact ME solution is derived. An alternative way to compute phenomenological rate coefficients, simulation least-squares (SLS), is presented. SLS is often about as accurate as CSE, and sometimes has significant advantages over CSE. One particular variant of SLS, using the matrix exponential, is as fast as CSE, and seems to be more robust. However, all of the existing methods for constructing reduced models to approximate the ME, including CSE and SLS, are inaccurate under some conditions, and sometimes they fail dramatically due to numerical problems. The challenge of constructing useful reduced models that more reliably emulate the full ME solution is discussed.


Asunto(s)
Modelos Teóricos , Cinética , Simulación por Computador , Análisis de los Mínimos Cuadrados
14.
J Chem Inf Model ; 62(9): 2101-2110, 2022 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-34734699

RESUMEN

The estimation of chemical reaction properties such as activation energies, rates, or yields is a central topic of computational chemistry. In contrast to molecular properties, where machine learning approaches such as graph convolutional neural networks (GCNNs) have excelled for a wide variety of tasks, no general and transferable adaptations of GCNNs for reactions have been developed yet. We therefore combined a popular cheminformatics reaction representation, the so-called condensed graph of reaction (CGR), with a recent GCNN architecture to arrive at a versatile, robust, and compact deep learning model. The CGR is a superposition of the reactant and product graphs of a chemical reaction and thus an ideal input for graph-based machine learning approaches. The model learns to create a data-driven, task-dependent reaction embedding that does not rely on expert knowledge, similar to current molecular GCNNs. Our approach outperforms current state-of-the-art models in accuracy, is applicable even to imbalanced reactions, and possesses excellent predictive capabilities for diverse target properties, such as activation energies, reaction enthalpies, rate constants, yields, or reaction classes. We furthermore curated a large set of atom-mapped reactions along with their target properties, which can serve as benchmark data sets for future work. All data sets and the developed reaction GCNN model are available online, free of charge, and open source.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Quimioinformática
15.
J Chem Inf Model ; 62(1): 16-26, 2022 01 10.
Artículo en Inglés | MEDLINE | ID: mdl-34939786

RESUMEN

Heuristic and machine learning models for rank-ordering reaction templates comprise an important basis for computer-aided organic synthesis regarding both product prediction and retrosynthetic pathway planning. Their viability relies heavily on the quality and characteristics of the underlying template database. With the advent of automated reaction and template extraction software and consequently the creation of template databases too large for manual curation, a data-driven approach to assess and improve the quality of template sets is needed. We therefore systematically studied the influence of template generality, canonicalization, and exclusivity on the performance of different template ranking models. We find that duplicate and nonexclusive templates, i.e., templates which describe the same chemical transformation on identical or overlapping sets of molecules, decrease both the accuracy of the ranking algorithm and the applicability of the respective top-ranked templates significantly. To remedy the negative effects of nonexclusivity, we developed a general and computationally efficient framework to deduplicate and hierarchically correct templates. As a result, performance improved considerably for both heuristic and machine learning template ranking models, as well as multistep retrosynthetic planning models. The canonicalization and correction code is made freely available.


Asunto(s)
Algoritmos , Programas Informáticos , Computadores , Heurística , Aprendizaje Automático
16.
J Chem Inf Model ; 62(3): 433-446, 2022 02 14.
Artículo en Inglés | MEDLINE | ID: mdl-35044781

RESUMEN

We present a group contribution method (SoluteGC) and a machine learning model (SoluteML) to predict the Abraham solute parameters, as well as a machine learning model (DirectML) to predict solvation free energy and enthalpy at 298 K. The proposed group contribution method uses atom-centered functional groups with corrections for ring and polycyclic strain while the machine learning models adopt a directed message passing neural network. The solute parameters predicted from SoluteGC and SoluteML are used to calculate solvation energy and enthalpy via linear free energy relationships. Extensive data sets containing 8366 solute parameters, 20,253 solvation free energies, and 6322 solvation enthalpies are compiled in this work to train the models. The three models are each evaluated on the same test sets using both random and substructure-based solute splits for solvation energy and enthalpy predictions. The results show that the DirectML model is superior to the SoluteML and SoluteGC models for both predictions and can provide accuracy comparable to that of advanced quantum chemistry methods. Yet, even though the DirectML model performs better in general, all three models are useful for various purposes. Uncertain predicted values can be identified by comparing the three models, and when the 3 models are combined together, they can provide even more accurate predictions than any one of them individually. Finally, we present our compiled solute parameter, solvation energy, and solvation enthalpy databases (SoluteDB, dGsolvDBx, dHsolvDB) and provide public access to our final prediction models through a simple web-based tool, software packages, and source code.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Entropía , Soluciones , Solventes , Termodinámica
17.
J Chem Inf Model ; 62(20): 4906-4915, 2022 10 24.
Artículo en Inglés | MEDLINE | ID: mdl-36222558

RESUMEN

The Reaction Mechanism Generator (RMG) database for chemical property prediction is presented. The RMG database consists of curated datasets and estimators for accurately predicting the parameters necessary for constructing a wide variety of chemical kinetic mechanisms. These datasets and estimators are mostly published and enable prediction of thermodynamics, kinetics, solvation effects, and transport properties. For thermochemistry prediction, the RMG database contains 45 libraries of thermochemical parameters with a combination of 4564 entries and a group additivity scheme with 9 types of corrections including radical, polycyclic, and surface absorption corrections with 1580 total curated groups and parameters for a graph convolutional neural network trained using transfer learning from a set of >130 000 DFT calculations to 10 000 high-quality values. Correction schemes for solvent-solute effects, important for thermochemistry in the liquid phase, are available. They include tabulated values for 195 pure solvents and 152 common solutes and a group additivity scheme for predicting the properties of arbitrary solutes. For kinetics estimation, the database contains 92 libraries of kinetic parameters containing a combined 21 000 reactions and contains rate rule schemes for 87 reaction classes trained on 8655 curated training reactions. Additional libraries and estimators are available for transport properties. All of this information is easily accessible through the graphical user interface at https://rmg.mit.edu. Bulk or on-the-fly use can be facilitated by interfacing directly with the RMG Python package which can be installed from Anaconda. The RMG database provides kineticists with easy access to estimates of the many parameters they need to model and analyze kinetic systems. This helps to speed up and facilitate kinetic analysis by enabling easy hypothesis testing on pathways, by providing parameters for model construction, and by providing checks on kinetic parameters from other sources.


Asunto(s)
Modelos Químicos , Cinética , Termodinámica , Bases de Datos Factuales , Solventes
18.
Environ Sci Technol ; 2022 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-35171556

RESUMEN

Electric vehicles (EVs) are a promising pathway to providing cleaner personal mobility. China provides substantial supports to increase EV market share. This study provides an extensive analysis of the currently unclear environmental and health benefits of these incentives at the provincial level. EVs in China have modest cradle-to-gate CO2 benefits (on average 29%) compared to conventional internal combustion engine vehicles (ICEVs), but have similar carbon emissions relative to hybrid electric vehicles. Well-to-wheel air pollutant emissions assessment shows that emissions associated with ICEVs are mainly from gasoline production, not the tailpipe, suggesting tighter emissions controls on refineries are needed to combat air pollution problems effectively. By integrating a vehicle fleet model into policy scenario analysis, we quantify the policy impacts associated with the passenger vehicles in the major Chinese provinces: broader EV penetration, especially combined with cleaner power generation, could deliver greater air quality and health benefits, but not necessarily significant climate change mitigation. The total value to society of the climate and mortality benefits in 2030 is found to be comparable to a prior estimate of the EV policy's economic costs.

19.
J Phys Chem A ; 126(25): 3976-3986, 2022 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-35727075

RESUMEN

Quantitative estimates of reaction barriers are essential for developing kinetic mechanisms and predicting reaction outcomes. However, the lack of experimental data and the steep scaling of accurate quantum calculations often hinder the ability to obtain reliable kinetic values. Here, we train a directed message passing neural network on nearly 24,000 diverse gas-phase reactions calculated at CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP. Our model uses 75% fewer parameters than previous studies, an improved reaction representation, and proper data splits to accurately estimate performance on unseen reactions. Using information from only the reactant and product, our model quickly predicts barrier heights with a testing MAE of 2.6 kcal mol-1 relative to the coupled-cluster data, making it more accurate than a good density functional theory calculation. Furthermore, our results show that future modeling efforts to estimate reaction properties would significantly benefit from fine-tuning calibration using a transfer learning technique. We anticipate this model will accelerate and improve kinetic predictions for small molecule chemistry.


Asunto(s)
Termodinámica , Cinética
20.
Mol Pharm ; 18(8): 3037-3049, 2021 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-34236207

RESUMEN

Stress testing of active pharmaceutical ingredients (API) is an important tool used to gauge chemical stability and identify potential degradation products. While different flavors of API stress testing systems have been used in experimental investigations for decades, the detailed kinetics of such systems as well as the chemical composition of prominent reactive species, specifically reactive oxygen species, are unknown. As a first step toward understanding and modeling API oxidation in stress testing, we investigated a typical radical "soup" solution an API is subject to during stress testing. Here we applied ab initio electronic structure calculations to automatically generate and refine a detailed chemical kinetics model, taking a fresh look at API oxidation. We generated a detailed kinetic model for a representative azobis(isobutyronitrile) (AIBN)/H2O/CH3OH stress-testing system with a varied cosolvent ratio (50%/50%-99.5%/0.5% vol water/methanol) for 5.0 mM AIBN and representative pH values of 4-10 at 40 °C that was stirred and open to the atmosphere. At acidic conditions, hydroxymethyl alkoxyl is the dominant alkoxyl radical, and at basic conditions, for most studied initial methanol concentrations, cyanoisopropyl alkoxyl becomes the dominant alkoxyl radical, albeit at an overall lower concentration. At acidic conditions, the levels of cyanoisopropyl peroxyl, hydroxymethyl peroxyl, and hydroperoxyl radicals are relatively high and comparable, while, at both neutral and basic pH conditions, superoxide becomes the prominent radical in the system. The present work reveals the prominent species in a common model API stress testing system at various cosolvent and pH conditions, sets the stage for an in-depth quantitative API kinetic study, and demonstrates the usage of novel software tools for automated chemical kinetic model generation and ab initio refinement.


Asunto(s)
Metanol/química , Modelos Químicos , Nitrilos/química , Agua/química , Alcoholes/química , Simulación por Computador , Radicales Libres/química , Concentración de Iones de Hidrógeno , Cinética , Oxidación-Reducción , Especies Reactivas de Oxígeno/química , Programas Informáticos , Temperatura
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA