Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sci Data ; 11(1): 859, 2024 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-39122750

RESUMO

Computational and machine learning approaches to model the conformational landscape of macrocyclic peptides have the potential to enable rational design and optimization. However, accurate, fast, and scalable methods for modeling macrocycle geometries remain elusive. Recent deep learning approaches have significantly accelerated protein structure prediction and the generation of small-molecule conformational ensembles, yet similar progress has not been made for macrocyclic peptides due to their unique properties. Here, we introduce CREMP, a resource generated for the rapid development and evaluation of machine learning models for macrocyclic peptides. CREMP contains 36,198 unique macrocyclic peptides and their high-quality structural ensembles generated using the Conformer-Rotamer Ensemble Sampling Tool (CREST). Altogether, this new dataset contains nearly 31.3 million unique macrocycle geometries, each annotated with energies derived from semi-empirical extended tight-binding (xTB) DFT calculations. Additionally, we include 3,258 macrocycles with reported passive permeability data to couple conformational ensembles to experiment. We anticipate that this dataset will enable the development of machine learning models that can improve peptide design and optimization for novel therapeutics.


Assuntos
Aprendizado de Máquina , Peptídeos/química , Conformação Proteica , Compostos Macrocíclicos/química , Peptídeos Cíclicos/química
2.
J Phys Chem A ; 128(21): 4335-4352, 2024 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-38752854

RESUMO

Obtaining accurate enthalpies of formation of chemical species, ΔHf, often requires empirical corrections that connect the results of quantum mechanical (QM) calculations with the experimental enthalpies of elements in their standard state. One approach is to use atomization energy corrections followed by bond additivity corrections (BACs), such as those defined by Petersson et al. or Anantharaman and Melius. Another approach is to utilize isodesmic reactions (IDRs) as shown by Buerger et al. We implement both approaches in Arkane, an open-source software that can calculate species thermochemistry using results from various QM software packages. In this work, we collect 421 reference species from the literature to derive ΔHf corrections and fit atomization energy corrections and BACs for 15 commonly used model chemistries. We find that both types of BACs yield similar accuracy, although Anantharaman- and Melius-type BACs appear to generalize better. Furthermore, BACs tend to achieve better accuracy than IDRs for commonly used model chemistries, and IDRs can be less robust because of the sensitivity to the chosen reference species and reactions. Overall, Anantharaman- and Melius-type BACs are our recommended approach for achieving accurate QM corrections for enthalpies.

3.
J Chem Inf Model ; 62(20): 4906-4915, 2022 10 24.
Artigo em Inglês | MEDLINE | ID: mdl-36222558

RESUMO

The Reaction Mechanism Generator (RMG) database for chemical property prediction is presented. The RMG database consists of curated datasets and estimators for accurately predicting the parameters necessary for constructing a wide variety of chemical kinetic mechanisms. These datasets and estimators are mostly published and enable prediction of thermodynamics, kinetics, solvation effects, and transport properties. For thermochemistry prediction, the RMG database contains 45 libraries of thermochemical parameters with a combination of 4564 entries and a group additivity scheme with 9 types of corrections including radical, polycyclic, and surface absorption corrections with 1580 total curated groups and parameters for a graph convolutional neural network trained using transfer learning from a set of >130 000 DFT calculations to 10 000 high-quality values. Correction schemes for solvent-solute effects, important for thermochemistry in the liquid phase, are available. They include tabulated values for 195 pure solvents and 152 common solutes and a group additivity scheme for predicting the properties of arbitrary solutes. For kinetics estimation, the database contains 92 libraries of kinetic parameters containing a combined 21 000 reactions and contains rate rule schemes for 87 reaction classes trained on 8655 curated training reactions. Additional libraries and estimators are available for transport properties. All of this information is easily accessible through the graphical user interface at https://rmg.mit.edu. Bulk or on-the-fly use can be facilitated by interfacing directly with the RMG Python package which can be installed from Anaconda. The RMG database provides kineticists with easy access to estimates of the many parameters they need to model and analyze kinetic systems. This helps to speed up and facilitate kinetic analysis by enabling easy hypothesis testing on pathways, by providing parameters for model construction, and by providing checks on kinetic parameters from other sources.


Assuntos
Modelos Químicos , Cinética , Termodinâmica , Bases de Dados Factuais , Solventes
4.
J Chem Inf Model ; 61(6): 2686-2696, 2021 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-34048230

RESUMO

In chemical kinetics research, kinetic models containing hundreds of species and tens of thousands of elementary reactions are commonly used to understand and predict the behavior of reactive chemical systems. Reaction Mechanism Generator (RMG) is a software suite developed to automatically generate such models by incorporating and extrapolating from a database of known thermochemical and kinetic parameters. Here, we present the recent version 3 release of RMG and highlight improvements since the previously published description of RMG v1.0. Most notably, RMG can now generate heterogeneous catalysis models in addition to the previously available gas- and liquid-phase capabilities. For model analysis, new methods for local and global uncertainty analysis have been implemented to supplement first-order sensitivity analysis. The RMG database of thermochemical and kinetic parameters has been significantly expanded to cover more types of chemistry. The present release includes parallelization for faster model generation and a new molecule isomorphism approach to improve computational performance. RMG has also been updated to use Python 3, ensuring compatibility with the latest cheminformatics and machine learning packages. Overall, RMG v3.0 includes many changes which improve the accuracy of the generated chemical mechanisms and allow for exploration of a wider range of chemical systems.


Assuntos
Quimioinformática , Software , Cinética , Aprendizado de Máquina
5.
Phys Chem Chem Phys ; 22(41): 23618-23626, 2020 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-33112304

RESUMO

Lack of quality data and difficulty generating these data hinder quantitative understanding of reaction kinetics. Specifically, conventional methods to generate transition state structures are deficient in speed, accuracy, or scope. We describe a novel method to generate three-dimensional transition state structures for isomerization reactions using reactant and product geometries. Our approach relies on a graph neural network to predict the transition state distance matrix and a least squares optimization to reconstruct the coordinates based on which entries of the distance matrix the model perceives to be important. We feed the structures generated by our algorithm through a rigorous quantum mechanics workflow to ensure the predicted transition state corresponds to the ground truth reactant and product. In both generating viable geometries and predicting accurate transition states, our method achieves excellent results. We envision workflows like this, which combine neural networks and quantum chemistry calculations, will become the preferred methods for computing chemical reactions.

6.
Sci Data ; 7(1): 137, 2020 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-32385318

RESUMO

Reaction times, activation energies, branching ratios, yields, and many other quantitative attributes are important for precise organic syntheses and generating detailed reaction mechanisms. Often, it would be useful to be able to classify proposed reactions as fast or slow. However, quantitative chemical reaction data, especially for atom-mapped reactions, are difficult to find in existing databases. Therefore, we used automated potential energy surface exploration to generate 12,000 organic reactions involving H, C, N, and O atoms calculated at the ωB97X-D3/def2-TZVP quantum chemistry level. We report the results of geometry optimizations and frequency calculations for reactants, products, and transition states of all reactions. Additionally, we extracted atom-mapped reaction SMILES, activation energies, and enthalpies of reaction. We believe that this data will accelerate progress in automated methods for organic synthesis and reaction mechanism generation-for example, by enabling the development of novel machine learning models for quantitative reaction prediction.

7.
J Chem Inf Model ; 60(6): 2697-2717, 2020 06 22.
Artigo em Inglês | MEDLINE | ID: mdl-32243154

RESUMO

Advances in deep neural network (DNN)-based molecular property prediction have recently led to the development of models of remarkable accuracy and generalization ability, with graph convolutional neural networks (GCNNs) reporting state-of-the-art performance for this task. However, some challenges remain, and one of the most important that needs to be fully addressed concerns uncertainty quantification. DNN performance is affected by the volume and the quality of the training samples. Therefore, establishing when and to what extent a prediction can be considered reliable is just as important as outputting accurate predictions, especially when out-of-domain molecules are targeted. Recently, several methods to account for uncertainty in DNNs have been proposed, most of which are based on approximate Bayesian inference. Among these, only a few scale to the large data sets required in applications. Evaluating and comparing these methods has recently attracted great interest, but results are generally fragmented and absent for molecular property prediction. In this paper, we quantitatively compare scalable techniques for uncertainty estimation in GCNNs. We introduce a set of quantitative criteria to capture different uncertainty aspects and then use these criteria to compare MC-dropout, Deep Ensembles, and bootstrapping, both theoretically in a unified framework that separates aleatoric/epistemic uncertainty and experimentally on public data sets. Our experiments quantify the performance of the different uncertainty estimation methods and their impact on uncertainty-related error reduction. Our findings indicate that Deep Ensembles and bootstrapping consistently outperform MC-dropout, with different context-specific pros and cons. Our analysis leads to a better understanding of the role of aleatoric/epistemic uncertainty, also in relation to the target data set features, and highlights the challenge posed by out-of-domain uncertainty.


Assuntos
Aprendizado Profundo , Teorema de Bayes , Redes Neurais de Computação , Incerteza
8.
J Phys Chem Lett ; 11(8): 2992-2997, 2020 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-32216310

RESUMO

Quantitative predictions of reaction properties, such as activation energy, have been limited due to a lack of available training data. Such predictions would be useful for computer-assisted reaction mechanism generation and organic synthesis planning. We develop a template-free deep learning model to predict the activation energy given reactant and product graphs and train the model on a new, diverse data set of gas-phase quantum chemistry reactions. We demonstrate that our model achieves accurate predictions and agrees with an intuitive understanding of chemical reactivity. With the continued generation of quantitative chemical reaction data and the development of methods that leverage such data, we expect many more methods for reactivity prediction to become available in the near future.

9.
J Phys Chem A ; 123(27): 5826-5835, 2019 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-31246465

RESUMO

Machine learning provides promising new methods for accurate yet rapid prediction of molecular properties, including thermochemistry, which is an integral component of many computer simulations, particularly automated reaction mechanism generation. Often, very large data sets with tens of thousands of molecules are required for training the models, but most data sets of experimental or high-accuracy quantum mechanical quality are much smaller. To overcome these limitations, we calculate new high-level data sets and derive bond additivity corrections to significantly improve enthalpies of formation. We adopt a transfer learning technique to train neural network models that achieve good performance even with a relatively small set of high-accuracy data. The training data for the entropy model are carefully selected so that important conformational effects are captured. The resulting models are generally applicable thermochemistry predictors for organic compounds with oxygen and nitrogen heteroatoms that approach experimental and coupled cluster accuracy while only requiring molecular graph inputs. Due to their versatility and the ease of adding new training data, they are poised to replace conventional estimation methods for thermochemical parameters in reaction mechanism generation. Since high-accuracy data are often sparse, similar transfer learning approaches are expected to be useful for estimating many other molecular properties.

10.
J Phys Chem A ; 123(10): 2142-2152, 2019 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-30758953

RESUMO

Because collecting precise and accurate chemistry data is often challenging, chemistry data sets usually only span a small region of chemical space, which limits the performance and the scope of applicability of data-driven models. To address this issue, we integrated an active learning machine with automatic ab initio calculations to form a self-evolving model that can continuously adapt to new species appointed by the users. In the present work, we demonstrate the self-evolving concept by modeling the formation enthalpies of stable closed-shell polycyclic species calculated at the B3LYP/6-31G(2df,p) level of theory. By combining a molecular graph convolutional neural network with a dropout training strategy, the model we developed can predict density functional theory (DFT) enthalpies for a broad range of polycyclic species and assess the quality of each predicted value. For the species which the current model is uncertain about, the automatic ab initio calculations provide additional training data to improve the performance of the model. For a test set composed of 2858 cyclic and polycyclic hydrocarbons and oxygenates, the enthalpies predicted by the model agree with the reference DFT values with a root-mean-square error of 2.62 kcal/mol. We found that a model originally trained on hydrocarbons and oxygenates can broaden its prediction coverage to nitrogen-containing species via an active learning process, suggesting that the continuous learning strategy is not only able to improve the model accuracy but is also capable of expanding the predictive capacity of a model to unseen species domains.

11.
J Am Chem Soc ; 140(3): 1035-1048, 2018 01 24.
Artigo em Inglês | MEDLINE | ID: mdl-29271202

RESUMO

Ketohydroperoxides are important in liquid-phase autoxidation and in gas-phase partial oxidation and pre-ignition chemistry, but because of their low concentration, instability, and various analytical chemistry limitations, it has been challenging to experimentally determine their reactivity, and only a few pathways are known. In the present work, 75 elementary-step unimolecular reactions of the simplest γ-ketohydroperoxide, 3-hydroperoxypropanal, were discovered by a combination of density functional theory with several automated transition-state search algorithms: the Berny algorithm coupled with the freezing string method, single- and double-ended growing string methods, the heuristic KinBot algorithm, and the single-component artificial force induced reaction method (SC-AFIR). The present joint approach significantly outperforms previous manual and automated transition-state searches - 68 of the reactions of γ-ketohydroperoxide discovered here were previously unknown and completely unexpected. All of the methods found the lowest-energy transition state, which corresponds to the first step of the Korcek mechanism, but each algorithm except for SC-AFIR detected several reactions not found by any of the other methods. We show that the low-barrier chemical reactions involve promising new chemistry that may be relevant in atmospheric and combustion systems. Our study highlights the complexity of chemical space exploration and the advantage of combined application of several approaches. Overall, the present work demonstrates both the power and the weaknesses of existing fully automated approaches for reaction discovery which suggest possible directions for further method development and assessment in order to enable reliable discovery of all important reactions of any specified reactant(s).

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA