Pesquisa | Portal Regional da BVS

1.

On the use of real-world datasets for reaction yield prediction.

Saebi, Mandana; Nan, Bozhao; Herr, John E; Wahlers, Jessica; Guo, Zhichun; Zuranski, Andrzej M; Kogej, Thierry; Norrby, Per-Ola; Doyle, Abigail G; Chawla, Nitesh V; Wiest, Olaf.

Chem Sci ; 14(19): 4997-5005, 2023 May 17.

Artigo em Inglês | MEDLINE | ID: mdl-37206399

RESUMO

The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly available. The first real-world dataset from the ELNs of a large pharmaceutical company is disclosed and its relationship to high-throughput experimentation (HTE) datasets is described. For chemical yield predictions, a key task in chemical synthesis, an attributed graph neural network (AGNN) performs as well as or better than the best previous models on two HTE datasets for the Suzuki-Miyaura and Buchwald-Hartwig reactions. However, training the AGNN on an ELN dataset does not lead to a predictive model. The implications of using ELN data for training ML-based models are discussed in the context of yield predictions.

2.

SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials.

Eastman, Peter; Behara, Pavan Kumar; Dotson, David L; Galvelis, Raimondas; Herr, John E; Horton, Josh T; Mao, Yuezhi; Chodera, John D; Pritchard, Benjamin P; Wang, Yuanqing; De Fabritiis, Gianni; Markland, Thomas E.

Sci Data ; 10(1): 11, 2023 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-36599873

RESUMO

Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations.

3.

End-to-end differentiable construction of molecular mechanics force fields.

Wang, Yuanqing; Fass, Josh; Kaminow, Benjamin; Herr, John E; Rufa, Dominic; Zhang, Ivy; Pulido, Iván; Henry, Mike; Bruce Macdonald, Hannah E; Takaba, Kenichiro; Chodera, John D.

Chem Sci ; 13(41): 12016-12033, 2022 Oct 26.

Artigo em Inglês | MEDLINE | ID: mdl-36349096

RESUMO

Molecular mechanics (MM) potentials have long been a workhorse of computational chemistry. Leveraging accuracy and speed, these functional forms find use in a wide variety of applications in biomolecular modeling and drug discovery, from rapid virtual screening to detailed free energy calculations. Traditionally, MM potentials have relied on human-curated, inflexible, and poorly extensible discrete chemical perception rules (atom types) for applying parameters to small molecules or biopolymers, making it difficult to optimize both types and parameters to fit quantum chemical or physical property data. Here, we propose an alternative approach that uses graph neural networks to perceive chemical environments, producing continuous atom embeddings from which valence and nonbonded parameters can be predicted using invariance-preserving layers. Since all stages are built from smooth neural functions, the entire process-spanning chemical perception to parameter assignment-is modular and end-to-end differentiable with respect to model parameters, allowing new force fields to be easily constructed, extended, and applied to arbitrary molecules. We show that this approach is not only sufficiently expressive to reproduce legacy atom types, but that it can learn to accurately reproduce and extend existing molecular mechanics force fields. Trained with arbitrary loss functions, it can construct entirely new force fields self-consistently applicable to both biopolymers and small molecules directly from quantum chemical calculations, with superior fidelity than traditional atom or parameter typing schemes. When adapted to simultaneously fit partial charge models, espaloma delivers high-quality partial atomic charges orders of magnitude faster than current best-practices with low inaccuracy. When trained on the same quantum chemical small molecule dataset used to parameterize the Open Force Field ("Parsley") openff-1.2.0 small molecule force field augmented with a peptide dataset, the resulting espaloma model shows superior accuracy vis-á-vis experiments in computing relative alchemical free energy calculations for a popular benchmark. This approach is implemented in the free and open source package espaloma, available at https://github.com/choderalab/espaloma.

4.

Compressing physics with an autoencoder: Creating an atomic species representation to improve machine learning models in the chemical sciences.

Herr, John E; Koh, Kevin; Yao, Kun; Parkhill, John.

J Chem Phys ; 151(8): 084103, 2019 Aug 28.

Artigo em Inglês | MEDLINE | ID: mdl-31470722

RESUMO

We define a vector quantity which corresponds to atomic species identity by compressing a set of physical properties with an autoencoder. This vector, referred to here as the elemental modes, provides many advantages in downstream machine learning tasks. Using the elemental modes directly as the feature vector, we trained a neural network to predict formation energies of elpasolites with improved accuracy over previous works on the same task. Combining the elemental modes with geometric features used in high-dimensional neural network potentials (HD-NNPs) solves many problems of scaling and efficiency in the development of such neural network potentials. Whereas similar models in the past have been limited to typically four atomic species (H, C, N, and O), our implementation does not scale in cost by adding more atomic species and allows us to train an HD-NNP model which treats molecules containing H, C, N, O, F, P, S, Cl, Se, Br, and I. Finally, we establish that our implementation allows us to define feature vectors for alchemical intermediate states in the HD-NNP model, which opens up new possibilities for performing alchemical free energy calculations on systems where bond breaking/forming is important.

5.

Metadynamics for training neural network model chemistries: A competitive assessment.

Herr, John E; Yao, Kun; McIntyre, Ryker; Toth, David W; Parkhill, John.

J Chem Phys ; 148(24): 241710, 2018 Jun 28.

Artigo em Inglês | MEDLINE | ID: mdl-29960377

RESUMO

Neural network model chemistries (NNMCs) promise to facilitate the accurate exploration of chemical space and simulation of large reactive systems. One important path to improving these models is to add layers of physical detail, especially long-range forces. At short range, however, these models are data driven and data limited. Little is systematically known about how data should be sampled, and "test data" chosen randomly from some sampling techniques can provide poor information about generality. If the sampling method is narrow, "test error" can appear encouragingly tiny while the model fails catastrophically elsewhere. In this manuscript, we competitively evaluate two common sampling methods: molecular dynamics (MD), normal-mode sampling, and one uncommon alternative, Metadynamics (MetaMD), for preparing training geometries. We show that MD is an inefficient sampling method in the sense that additional samples do not improve generality. We also show that MetaMD is easily implemented in any NNMC software package with cost that scales linearly with the number of atoms in a sample molecule. MetaMD is a black-box way to ensure samples always reach out to new regions of chemical space, while remaining relevant to chemistry near kbT. It is a cheap tool to address the issue of generalization.

6.

The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics.

Yao, Kun; Herr, John E; Toth, David W; Mckintyre, Ryker; Parkhill, John.

Chem Sci ; 9(8): 2261-2269, 2018 Feb 28.

Artigo em Inglês | MEDLINE | ID: mdl-29719699

RESUMO

Traditional force fields cannot model chemical reactivity, and suffer from low generality without re-fitting. Neural network potentials promise to address these problems, offering energies and forces with near ab initio accuracy at low cost. However a data-driven approach is naturally inefficient for long-range interatomic forces that have simple physical formulas. In this manuscript we construct a hybrid model chemistry consisting of a nearsighted neural network potential with screened long-range electrostatic and van der Waals physics. This trained potential, simply dubbed "TensorMol-0.1", is offered in an open-source Python package capable of many of the simulation types commonly used to study chemistry: geometry optimizations, harmonic spectra, open or periodic molecular dynamics, Monte Carlo, and nudged elastic band calculations. We describe the robustness and speed of the package, demonstrating its millihartree accuracy and scalability to tens-of-thousands of atoms on ordinary laptops. We demonstrate the performance of the model by reproducing vibrational spectra, and simulating the molecular dynamics of a protein. Our comparisons with electronic structure theory and experimental data demonstrate that neural network molecular dynamics is poised to become an important tool for molecular simulation, lowering the resource barrier to simulating chemistry.

7.

Origin of the Size-Dependent Stokes Shift in CsPbBr₃ Perovskite Nanocrystals.

Brennan, Michael C; Herr, John E; Nguyen-Beck, Triet S; Zinna, Jessica; Draguta, Sergiu; Rouvimov, Sergei; Parkhill, John; Kuno, Masaru.

J Am Chem Soc ; 139(35): 12201-12208, 2017 09 06.

Artigo em Inglês | MEDLINE | ID: mdl-28772067

RESUMO

The origin of the size-dependent Stokes shift in CsPbBr3 nanocrystals (NCs) is explained for the first time. Stokes shifts range from 82 to 20 meV for NCs with effective edge lengths varying from â¼4 to 13 nm. We show that the Stokes shift is intrinsic to the NC electronic structure and does not arise from extrinsic effects such as residual ensemble size distributions, impurities, or solvent-related effects. The origin of the Stokes shift is elucidated via first-principles calculations. Corresponding theoretical modeling of the CsPbBr3 NC density of states and band structure reveals the existence of an intrinsic confined hole state 260 to 70 meV above the valence band edge state for NCs with edge lengths from â¼2 to 5 nm. A size-dependent Stokes shift is therefore predicted and is in quantitative agreement with the experimental data. Comparison between bulk and NC calculations shows that the confined hole state is exclusive to NCs. At a broader level, the distinction between absorbing and emitting states in CsPbBr3 is likely a general feature of other halide perovskite NCs and can be tuned via NC size to enhance applications involving these materials.

8.

Intrinsic Bond Energies from a Bonds-in-Molecules Neural Network.

Yao, Kun; Herr, John E; Brown, Seth N; Parkhill, John.

J Phys Chem Lett ; 8(12): 2689-2694, 2017 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-28573865

RESUMO

Neural networks are being used to make new types of empirical chemical models as inexpensive as force fields, but with accuracy similar to the ab initio methods used to build them. In this work, we present a neural network that predicts the energies of molecules as a sum of intrinsic bond energies. The network learns the total energies of the popular GDB9 database to a competitive MAE of 0.94 kcal/mol on molecules outside of its training set, is naturally linearly scaling, and applicable to molecules consisting of thousands of bonds. More importantly, it gives chemical insight into the relative strengths of bonds as a function of their molecular environment, despite only being trained on total energy information. We show that the network makes predictions of relative bond strengths in good agreement with measured trends and human predictions. A Bonds-in-Molecules Neural Network (BIM-NN) learns heuristic relative bond strengths like expert synthetic chemists, and compares well with ab initio bond order measures such as NBO analysis.

9.

The many-body expansion combined with neural networks.

Yao, Kun; Herr, John E; Parkhill, John.

J Chem Phys ; 146(1): 014106, 2017 Jan 07.

Artigo em Inglês | MEDLINE | ID: mdl-28063436

RESUMO

Fragmentation methods such as the many-body expansion (MBE) are a common strategy to model large systems by partitioning energies into a hierarchy of decreasingly significant contributions. The number of calculations required for chemical accuracy is still prohibitively expensive for the ab initio MBE to compete with force field approximations for applications beyond single-point energies. Alongside the MBE, empirical models of ab initio potential energy surfaces have improved, especially non-linear models based on neural networks (NNs) which can reproduce ab initio potential energy surfaces rapidly and accurately. Although they are fast, NNs suffer from their own curse of dimensionality; they must be trained on a representative sample of chemical space. In this paper we examine the synergy of the MBE and NN's and explore their complementarity. The MBE offers a systematic way to treat systems of arbitrary size while reducing the scaling problem of large systems. NN's reduce, by a factor in excess of 106, the computational overhead of the MBE and reproduce the accuracy of ab initio calculations without specialized force fields. We show that for a small molecule extended system like methanol, accuracy can be achieved with drastically different chemical embeddings. To assess this we test a new chemical embedding which can be inverted to predict molecules with desired properties. We also provide our open-source code for the neural network many-body expansion, Tensormol.

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA