Pesquisa | Portal Regional da BVS

1.

A community-powered search of machine learning strategy space to find NMR property prediction models.

Bratholm, Lars A; Gerrard, Will; Anderson, Brandon; Bai, Shaojie; Choi, Sunghwan; Dang, Lam; Hanchar, Pavel; Howard, Addison; Kim, Sanghoon; Kolter, Zico; Kondor, Risi; Kornbluth, Mordechai; Lee, Youhan; Lee, Youngsoo; Mailoa, Jonathan P; Nguyen, Thanh Tu; Popovic, Milos; Rakocevic, Goran; Reade, Walter; Song, Wonho; Stojanovic, Luka; Thiede, Erik H; Tijanic, Nebojsa; Torrubia, Andres; Willmott, Devin; Butts, Craig P; Glowacki, David R.

PLoS One ; 16(7): e0253612, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34283864

RESUMO

The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published 'in-house' efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties.

Assuntos

Ciência do Cidadão/métodos , Ciência do Cidadão/tendências , Previsões/métodos , Algoritmos , Participação da Comunidade , Humanos , Aprendizado de Máquina/tendências , Imageamento por Ressonância Magnética/métodos , Espectroscopia de Ressonância Magnética/métodos , Modelos Estatísticos

2.

Training atomic neural networks using fragment-based data generated in virtual reality.

Amabilino, Silvia; Bratholm, Lars A; Bennie, Simon J; O'Connor, Michael B; Glowacki, David R.

J Chem Phys ; 153(15): 154105, 2020 Oct 21.

Artigo em Inglês | MEDLINE | ID: mdl-33092381

RESUMO

The ability to understand and engineer molecular structures relies on having accurate descriptions of the energy as a function of atomic coordinates. Here, we outline a new paradigm for deriving energy functions of hyperdimensional molecular systems, which involves generating data for low-dimensional systems in virtual reality (VR) to then efficiently train atomic neural networks (ANNs). This generates high-quality data for specific areas of interest within the hyperdimensional space that characterizes a molecule's potential energy surface (PES). We demonstrate the utility of this approach by gathering data within VR to train ANNs on chemical reactions involving fewer than eight heavy atoms. This strategy enables us to predict the energies of much higher-dimensional systems, e.g., containing nearly 100 atoms. Training on datasets containing only 15k geometries, this approach generates mean absolute errors around 2 kcal mol-1. This represents one of the first times that an ANN-PES for a large reactive radical has been generated using such a small dataset. Our results suggest that VR enables the intelligent curation of high-quality data, which accelerates the learning process.

3.

IMPRESSION - prediction of NMR parameters for 3-dimensional chemical structures using machine learning with near quantum chemical accuracy.

Gerrard, Will; Bratholm, Lars A; Packer, Martin J; Mulholland, Adrian J; Glowacki, David R; Butts, Craig P.

Chem Sci ; 11(2): 508-515, 2020 Jan 14.

Artigo em Inglês | MEDLINE | ID: mdl-32190270

RESUMO

The IMPRESSION (Intelligent Machine PREdiction of Shift and Scalar information Of Nuclei) machine learning system provides an efficient and accurate method for the prediction of NMR parameters from 3-dimensional molecular structures. Here we demonstrate that machine learning predictions of NMR parameters, trained on quantum chemical computed values, can be as accurate as, but computationally much more efficient (tens of milliseconds per molecular structure) than, quantum chemical calculations (hours/days per molecular structure) starting from the same 3-dimensional structure. Training the machine learning system on quantum chemical predictions, rather than experimental data, circumvents the need for the existence of large, structurally diverse, error-free experimental databases and makes IMPRESSION applicable to solving 3-dimensional problems such as molecular conformation and stereoisomerism.

4.

FCHL revisited: Faster and more accurate quantum machine learning.

Christensen, Anders S; Bratholm, Lars A; Faber, Felix A; Anatole von Lilienfeld, O.

J Chem Phys ; 152(4): 044107, 2020 Jan 31.

Artigo em Inglês | MEDLINE | ID: mdl-32007071

RESUMO

We introduce the FCHL19 representation for atomic environments in molecules or condensed-phase systems. Machine learning models based on FCHL19 are able to yield predictions of atomic forces and energies of query compounds with chemical accuracy on the scale of milliseconds. FCHL19 is a revision of our previous work [F. A. Faber et al., J. Chem. Phys. 148, 241717 (2018)] where the representation is discretized and the individual features are rigorously optimized using Monte Carlo optimization. Combined with a Gaussian kernel function that incorporates elemental screening, chemical accuracy is reached for energy learning on the QM7b and QM9 datasets after training for minutes and hours, respectively. The model also shows good performance for non-bonded interactions in the condensed phase for a set of water clusters with a mean absolute error (MAE) binding energy error of less than 0.1 kcal/mol/molecule after training on 3200 samples. For force learning on the MD17 dataset, our optimized model similarly displays state-of-the-art accuracy with a regressor based on Gaussian process regression. When the revised FCHL19 representation is combined with the operator quantum machine learning regressor, forces and energies can be predicted in only a few milliseconds per atom. The model presented herein is fast and lightweight enough for use in general chemistry problems as well as molecular dynamics simulations.

5.

Training Neural Nets To Learn Reactive Potential Energy Surfaces Using Interactive Quantum Chemistry in Virtual Reality.

Amabilino, Silvia; Bratholm, Lars A; Bennie, Simon J; Vaucher, Alain C; Reiher, Markus; Glowacki, David R.

J Phys Chem A ; 123(20): 4486-4499, 2019 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-30892040

RESUMO

While the primary bottleneck to a number of computational workflows was not so long ago limited by processing power, the rise of machine learning technologies has resulted in an interesting paradigm shift, which places increasing value on issues related to data curation-that is, data size, quality, bias, format, and coverage. Increasingly, data-related issues are equally as important as the algorithmic methods used to process and learn from the data. Here we introduce an open-source graphics processing unit-accelerated neural network (NN) framework for learning reactive potential energy surfaces (PESs). To obtain training data for this NN framework, we investigate the use of real-time interactive ab initio molecular dynamics in virtual reality (iMD-VR) as a new data curation strategy that enables human users to rapidly sample geometries along reaction pathways. Focusing on hydrogen abstraction reactions of CN radical with isopentane, we compare the performance of NNs trained using iMD-VR data versus NNs trained using a more traditional method, namely, molecular dynamics (MD) constrained to sample a predefined grid of points along the hydrogen abstraction reaction coordinate. Both the NN trained using iMD-VR data and the NN trained using the constrained MD data reproduce important qualitative features of the reactive PESs, such as a low and early barrier to abstraction. Quantitative analysis shows that NN learning is sensitive to the data set used for training. Our results show that user-sampled structures obtained with the quantum chemical iMD-VR machinery enable excellent sampling in the vicinity of the minimum energy path (MEP). As a result, the NN trained on the iMD-VR data does very well predicting energies that are close to the MEP but less well predicting energies for "off-path" structures. The NN trained on the constrained MD data does better predicting high-energy off-path structures, given that it included a number of such structures in its training set.

6.

Low dimensional representations along intrinsic reaction coordinates and molecular dynamics trajectories using interatomic distance matrices.

Hare, Stephanie R; Bratholm, Lars A; Glowacki, David R; Carpenter, Barry K.

Chem Sci ; 10(43): 9954-9968, 2019 Nov 21.

Artigo em Inglês | MEDLINE | ID: mdl-32055352

RESUMO

Most chemical transformations (reactions or conformational changes) that are of interest to researchers have many degrees of freedom, usually too many to visualize without reducing the dimensionality of the system to include only the most important atomic motions. In this article, we describe a method of using Principal Component Analysis (PCA) for analyzing a series of molecular geometries (e.g., a reaction pathway or molecular dynamics trajectory) and determining the reduced dimensional space that captures the most structural variance in the fewest dimensions. The software written to carry out this method is called PathReducer, which permits (1) visualizing the geometries in a reduced dimensional space, (2) determining the axes that make up the reduced dimensional space, and (3) projecting the series of geometries into the low-dimensional space for visualization. We investigated two options to represent molecular structures within PathReducer: aligned Cartesian coordinates and matrices of interatomic distances. We found that interatomic distance matrices better captured non-linear motions in a smaller number of dimensions. To demonstrate the utility of PathReducer, we have carried out a number of applications where we have projected molecular dynamics trajectories into a reduced dimensional space defined by an intrinsic reaction coordinate. The visualizations provided by this analysis show that dynamic paths can differ greatly from the minimum energy pathway on a potential energy surface. Viewing intrinsic reaction coordinates and trajectories in this way provides a quick way to gather qualitative information about the pathways trajectories take relative to a minimum energy path. Given that the outputs from PCA are linear combinations of the input molecular structure coordinates (i.e., Cartesian coordinates or interatomic distances), they can be easily transferred to other types of calculations that require the definition of a reduced dimensional space (e.g., biased molecular dynamics simulations).

7.

Protein structure refinement using a quantum mechanics-based chemical shielding predictor.

Bratholm, Lars A; Jensen, Jan H.

Chem Sci ; 8(3): 2061-2072, 2017 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-28451325

RESUMO

The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ, 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1-0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural change may be due to force field deficiencies. The overall accuracy of the empirical methods are slightly improved by annealing the CHARMM structure with ProCS15, which may suggest that the minor structural changes introduced by ProCS15-based annealing improves the accuracy of the protein structures. Having established that QM-based chemical shift prediction can deliver the same accuracy as empirical shift predictors we hope this can help increase the accuracy of related approaches such as QM/MM or linear scaling approaches or interpreting protein structural dynamics from QM-derived chemical shift.

8.

ProCS15: a DFT-based chemical shift predictor for backbone and Cß atoms in proteins.

Larsen, Anders S; Bratholm, Lars A; Christensen, Anders S; Channir, Maher; Jensen, Jan H.

PeerJ ; 3: e1344, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26623185

RESUMO

We present ProCS15: a program that computes the isotropic chemical shielding values of backbone and Cß atoms given a protein structure in less than a second. ProCS15 is based on around 2.35 million OPBE/6-31G(d,p)//PM6 calculations on tripeptides and small structural models of hydrogen-bonding. The ProCS15-predicted chemical shielding values are compared to experimentally measured chemical shifts for Ubiquitin and the third IgG-binding domain of Protein G through linear regression and yield RMSD values of up to 2.2, 0.7, and 4.8 ppm for carbon, hydrogen, and nitrogen atoms. These RMSD values are very similar to corresponding RMSD values computed using OPBE/6-31G(d,p) for the entire structure for each proteins. These maximum RMSD values can be reduced by using NMR-derived structural ensembles of Ubiquitin. For example, for the largest ensemble the largest RMSD values are 1.7, 0.5, and 3.5 ppm for carbon, hydrogen, and nitrogen. The corresponding RMSD values predicted by several empirical chemical shift predictors range between 0.7-1.1, 0.2-0.4, and 1.8-2.8 ppm for carbon, hydrogen, and nitrogen atoms, respectively.

9.

Bayesian inference of protein structure from chemical shift data.

Bratholm, Lars A; Christensen, Anders S; Hamelryck, Thomas; Jensen, Jan H.

PeerJ ; 3: e861, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25825683

RESUMO

Protein chemical shifts are routinely used to augment molecular mechanics force fields in protein structure simulations, with weights of the chemical shift restraints determined empirically. These weights, however, might not be an optimal descriptor of a given protein structure and predictive model, and a bias is introduced which might result in incorrect structures. In the inferential structure determination framework, both the unknown structure and the disagreement between experimental and back-calculated data are formulated as a joint probability distribution, thus utilizing the full information content of the data. Here, we present the formulation of such a probability distribution where the error in chemical shift prediction is described by either a Gaussian or Cauchy distribution. The methodology is demonstrated and compared to a set of empirically weighted potentials through Markov chain Monte Carlo simulations of three small proteins (ENHD, Protein G and the SMN Tudor Domain) using the PROFASI force field and the chemical shift predictor CamShift. Using a clustering-criterion for identifying the best structure, together with the addition of a solvent exposure scoring term, the simulations suggests that sampling both the structure and the uncertainties in chemical shift prediction leads more accurate structures compared to conventional methods using empirical determined weights. The Cauchy distribution, using either sampled uncertainties or predetermined weights, did, however, result in overall better convergence to the native fold, suggesting that both types of distribution might be useful in different aspects of the protein structure prediction.

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA