Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 68
Filtrar
Más filtros

Base de datos
Tipo del documento
Intervalo de año de publicación
1.
Annu Rev Phys Chem ; 75(1): 371-395, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38941524

RESUMEN

In the past two decades, machine learning potentials (MLPs) have driven significant developments in chemical, biological, and material sciences. The construction and training of MLPs enable fast and accurate simulations and analysis of thermodynamic and kinetic properties. This review focuses on the application of MLPs to reaction systems with consideration of bond breaking and formation. We review the development of MLP models, primarily with neural network and kernel-based algorithms, and recent applications of reactive MLPs (RMLPs) to systems at different scales. We show how RMLPs are constructed, how they speed up the calculation of reactive dynamics, and how they facilitate the study of reaction trajectories, reaction rates, free energy calculations, and many other calculations. Different data sampling strategies applied in building RMLPs are also discussed with a focus on how to collect structures for rare events and how to further improve their performance with active learning.

2.
Chem Sci ; 15(23): 8800-8812, 2024 Jun 12.
Artículo en Inglés | MEDLINE | ID: mdl-38873063

RESUMEN

The Critical Assessment of Computational Hit-Finding Experiments (CACHE) Challenge series is focused on identifying small molecule inhibitors of protein targets using computational methods. Each challenge contains two phases, hit-finding and follow-up optimization, each of which is followed by experimental validation of the computational predictions. For the CACHE Challenge #1, the Leucine-Rich Repeat Kinase 2 (LRRK2) WD40 Repeat (WDR) domain was selected as the target for in silico hit-finding and optimization. Mutations in LRRK2 are the most common genetic cause of the familial form of Parkinson's disease. The LRRK2 WDR domain is an understudied drug target with no known molecular inhibitors. Herein we detail the first phase of our winning submission to the CACHE Challenge #1. We developed a framework for the high-throughput structure-based virtual screening of a chemically diverse small molecule space. Hit identification was performed using the large-scale Deep Docking (DD) protocol followed by absolute binding free energy (ABFE) simulations. ABFEs were computed using an automated molecular dynamics (MD)-based thermodynamic integration (TI) approach. 4.1 billion ligands from Enamine REAL were screened with DD followed by ABFEs computed by MD TI for 793 ligands. 76 ligands were prioritized for experimental validation, with 59 compounds successfully synthesized and 5 compounds identified as hits, yielding a 8.5% hit rate. Our results demonstrate the efficacy of the combined DD and ABFE approaches for hit identification for a target with no previously known hits. This approach is widely applicable for the efficient screening of ultra-large chemical libraries as well as rigorous protein-ligand binding affinity estimation leveraging modern computational resources.

4.
Nat Chem ; 16(5): 727-734, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38454071

RESUMEN

Atomistic simulation has a broad range of applications from drug design to materials discovery. Machine learning interatomic potentials (MLIPs) have become an efficient alternative to computationally expensive ab initio simulations. For this reason, chemistry and materials science would greatly benefit from a general reactive MLIP, that is, an MLIP that is applicable to a broad range of reactive chemistry without the need for refitting. Here we develop a general reactive MLIP (ANI-1xnr) through automated sampling of condensed-phase reactions. ANI-1xnr is then applied to study five distinct systems: carbon solid-phase nucleation, graphene ring formation from acetylene, biofuel additives, combustion of methane and the spontaneous formation of glycine from early earth small molecules. In all studies, ANI-1xnr closely matches experiment (when available) and/or previous studies using traditional model chemistry methods. As such, ANI-1xnr proves to be a highly general reactive MLIP for C, H, N and O elements in the condensed phase, enabling high-throughput in silico reactive chemistry experimentation.

5.
J Chem Theory Comput ; 20(3): 1193-1213, 2024 Feb 13.
Artículo en Inglés | MEDLINE | ID: mdl-38270978

RESUMEN

Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provides plenty of choice to the users who can run simulations with the command-line options, input files, or with scripts using MLatom as a Python package, both on their computers and on the online XACS cloud computing service at XACScloud.com. Computational chemists can calculate energies and thermochemical properties, optimize geometries, run molecular and quantum dynamics, and simulate (ro)vibrational, one-photon UV/vis absorption, and two-photon absorption spectra with ML, quantum mechanical, and combined models. The users can choose from an extensive library of methods containing pretrained ML models and quantum mechanical approximations such as AIQM1 approaching coupled-cluster accuracy. The developers can build their own models using various ML algorithms. The great flexibility of MLatom is largely due to the extensive use of the interfaces to many state-of-the-art software packages and libraries.

6.
Nat Rev Drug Discov ; 23(2): 141-155, 2024 02.
Artículo en Inglés | MEDLINE | ID: mdl-38066301

RESUMEN

Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.


Asunto(s)
Aprendizaje Profundo , Relación Estructura-Actividad Cuantitativa , Humanos , Inteligencia Artificial , Metodologías Computacionales , Teoría Cuántica , Descubrimiento de Drogas/métodos , Diseño de Fármacos
7.
Mol Inform ; 43(1): e202300262, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37833243

RESUMEN

The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availability of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against COVID-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.


Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , Pandemias , Bioensayo , Descubrimiento de Drogas
8.
Chem Sci ; 14(46): 13392-13401, 2023 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-38033903

RESUMEN

The emergence of Δ-learning models, whereby machine learning (ML) is used to predict a correction to a low-level energy calculation, provides a versatile route to accelerate high-level energy evaluations at a given geometry. However, Δ-learning models are inapplicable to reaction properties like heats of reaction and activation energies that require both a high-level geometry and energy evaluation. Here, a Δ2-learning model is introduced that can predict high-level activation energies based on low-level critical-point geometries. The Δ2 model uses an atom-wise featurization typical of contemporary ML interatomic potentials (MLIPs) and is trained on a dataset of ∼167 000 reactions, using the GFN2-xTB energy and critical-point geometry as a low-level input and the B3LYP-D3/TZVP energy calculated at the B3LYP-D3/TZVP critical point as a high-level target. The excellent performance of the Δ2 model on unseen reactions demonstrates the surprising ease with which the model implicitly learns the geometric deviations between the low-level and high-level geometries that condition the activation energy prediction. The transferability of the Δ2 model is validated on several external testing sets where it shows near chemical accuracy, illustrating the benefits of combining ML models with readily available physical-based information from semi-empirical quantum chemistry calculations. Fine-tuning of the Δ2 model on a small number of Gaussian-4 calculations produced a 35% accuracy improvement over DFT activation energy predictions while retaining xTB-level cost. The Δ2 model approach proves to be an efficient strategy for accelerating chemical reaction characterization with minimal sacrifice in prediction accuracy.

9.
Chem Sci ; 14(39): 10835-10846, 2023 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-37829036

RESUMEN

Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis prediction, but current models have failed to generalize to large literature datasets. To understand the causes and inspire future design, we systematically benchmarked the yield prediction task. We carefully curated and augmented a literature dataset of 41 239 amide coupling reactions, each with information on reactants, products, intermediates, yields, and reaction contexts, and provided 3D structures for the molecules. We calculated molecular features related to 2D and 3D structure information, as well as physical and electronic properties. These descriptors were paired with 4 categories of machine learning methods (linear, kernel, ensemble, and neural network), yielding valuable benchmarks about feature and model performance. Despite the excellent performance on a high-throughput experiment (HTE) dataset (R2 around 0.9), no method gave satisfactory results on the literature data. The best performance was an R2 of 0.395 ± 0.020 using the stack technique. Error analysis revealed that reactivity cliff and yield uncertainty are among the main reasons for incorrect predictions. Removing reactivity cliffs and uncertain reactions boosted the R2 to 0.457 ± 0.006. These results highlight that yield prediction models must be sensitive to the reactivity change due to the subtle structure variance, as well as be robust to the uncertainty associated with yield measurements.

10.
J Chem Phys ; 159(11)2023 Sep 21.
Artículo en Inglés | MEDLINE | ID: mdl-37712780

RESUMEN

Catalyzed by enormous success in the industrial sector, many research programs have been exploring data-driven, machine learning approaches. Performance can be poor when the model is extrapolated to new regions of chemical space, e.g., new bonding types, new many-body interactions. Another important limitation is the spatial locality assumption in model architecture, and this limitation cannot be overcome with larger or more diverse datasets. The outlined challenges are primarily associated with the lack of electronic structure information in surrogate models such as interatomic potentials. Given the fast development of machine learning and computational chemistry methods, we expect some limitations of surrogate models to be addressed in the near future; nevertheless spatial locality assumption will likely remain a limiting factor for their transferability. Here, we suggest focusing on an equally important effort-design of physics-informed models that leverage the domain knowledge and employ machine learning only as a corrective tool. In the context of material science, we will focus on semi-empirical quantum mechanics, using machine learning to predict corrections to the reduced-order Hamiltonian model parameters. The resulting models are broadly applicable, retain the speed of semiempirical chemistry, and frequently achieve accuracy on par with much more expensive ab initio calculations. These early results indicate that future work, in which machine learning and quantum chemistry methods are developed jointly, may provide the best of all worlds for chemistry applications that demand both high accuracy and high numerical efficiency.

12.
Chem Sci ; 14(20): 5438-5452, 2023 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-37234902

RESUMEN

Deep-HP is a scalable extension of the Tinker-HP multi-GPU molecular dynamics (MD) package enabling the use of Pytorch/TensorFlow Deep Neural Network (DNN) models. Deep-HP increases DNNs' MD capabilities by orders of magnitude offering access to ns simulations for 100k-atom biosystems while offering the possibility of coupling DNNs to any classical (FFs) and many-body polarizable (PFFs) force fields. It allows therefore the introduction of the ANI-2X/AMOEBA hybrid polarizable potential designed for ligand binding studies where solvent-solvent and solvent-solute interactions are computed with the AMOEBA PFF while solute-solute ones are computed by the ANI-2X DNN. ANI-2X/AMOEBA explicitly includes AMOEBA's physical long-range interactions via an efficient Particle Mesh Ewald implementation while preserving ANI-2X's solute short-range quantum mechanical accuracy. The DNN/PFF partition can be user-defined allowing for hybrid simulations to include key ingredients of biosimulation such as polarizable solvents, polarizable counter ions, etc.… ANI-2X/AMOEBA is accelerated using a multiple-timestep strategy focusing on the model's contributions to low-frequency modes of nuclear forces. It primarily evaluates AMOEBA forces while including ANI-2X ones only via correction-steps resulting in an order of magnitude acceleration over standard Velocity Verlet integration. Simulating more than 10 µs, we compute charged/uncharged ligand solvation free energies in 4 solvents, and absolute binding free energies of host-guest complexes from SAMPL challenges. ANI-2X/AMOEBA average errors are discussed in terms of statistical uncertainty and appear in the range of chemical accuracy compared to experiment. The availability of the Deep-HP computational platform opens the path towards large-scale hybrid DNN simulations, at force-field cost, in biophysics and drug discovery.

13.
J Am Chem Soc ; 145(16): 8736-8750, 2023 Apr 26.
Artículo en Inglés | MEDLINE | ID: mdl-37052978

RESUMEN

Traditional computational approaches to design chemical species are limited by the need to compute properties for a vast number of candidates, e.g., by discriminative modeling. Therefore, inverse design methods aim to start from the desired property and optimize a corresponding chemical structure. From a machine learning viewpoint, the inverse design problem can be addressed through so-called generative modeling. Mathematically, discriminative models are defined by learning the probability distribution function of properties given the molecular or material structure. In contrast, a generative model seeks to exploit the joint probability of a chemical species with target characteristics. The overarching idea of generative modeling is to implement a system that produces novel compounds that are expected to have a desired set of chemical features, effectively sidestepping issues found in the forward design process. In this contribution, we overview and critically analyze popular generative algorithms like generative adversarial networks, variational autoencoders, flow, and diffusion models. We highlight key differences between each of the models, provide insights into recent success stories, and discuss outstanding challenges for realizing generative modeling discovered solutions in chemical applications.

14.
Sci Data ; 10(1): 145, 2023 03 20.
Artículo en Inglés | MEDLINE | ID: mdl-36935430

RESUMEN

Existing reaction transition state (TS) databases are comparatively small and lack chemical diversity. Here, this data gap has been addressed using the concept of a graphically-defined model reaction to comprehensively characterize a reaction space associated with C, H, O, and N containing molecules with up to 10 heavy (non-hydrogen) atoms. The resulting dataset is composed of 176,992 organic reactions possessing at least one validated TS, activation energy, heat of reaction, reactant and product geometries, frequencies, and atom-mapping. For 33,032 reactions, more than one TS was discovered by conformational sampling, allowing conformational errors in TS prediction to be assessed. Data is supplied at the GFN2-xTB and B3LYP-D3/TZVP levels of theory. A subset of reactions were recalculated at the CCSD(T)-F12/cc-pVDZ-F12 and ωB97X-D2/def2-TZVP levels to establish relative errors. The resulting collection of reactions and properties are called the Reaction Graph Depth 1 (RGD1) dataset. RGD1 represents the largest and most chemically diverse TS dataset published to date and should find immediate use in developing novel machine learning models for predicting reaction properties.

15.
J Phys Chem A ; 127(11): 2417-2431, 2023 Mar 23.
Artículo en Inglés | MEDLINE | ID: mdl-36802360

RESUMEN

Advances in machine learned interatomic potentials (MLIPs), such as those using neural networks, have resulted in short-range models that can infer interaction energies with near ab initio accuracy and orders of magnitude reduced computational cost. For many atom systems, including macromolecules, biomolecules, and condensed matter, model accuracy can become reliant on the description of short- and long-range physical interactions. The latter terms can be difficult to incorporate into an MLIP framework. Recent research has produced numerous models with considerations for nonlocal electrostatic and dispersion interactions, leading to a large range of applications that can be addressed using MLIPs. In light of this, we present a Perspective focused on key methodologies and models being used where the presence of nonlocal physics and chemistry are crucial for describing system properties. The strategies covered include MLIPs augmented with dispersion corrections, electrostatics calculated with charges predicted from atomic environment descriptors, the use of self-consistency and message passing iterations to propagated nonlocal system information, and charges obtained via equilibration schemes. We aim to provide a pointed discussion to support the development of machine learning-based interatomic potentials for systems where contributions from only nearsighted terms are deficient.

16.
J Chem Inf Model ; 63(2): 583-594, 2023 01 23.
Artículo en Inglés | MEDLINE | ID: mdl-36599125

RESUMEN

In silico identification of potent protein inhibitors commonly requires prediction of a ligand binding free energy (BFE). Thermodynamics integration (TI) based on molecular dynamics (MD) simulations is a BFE calculation method capable of acquiring accurate BFE, but it is computationally expensive and time-consuming. In this work, we have developed an efficient automated workflow for identifying compounds with the lowest BFE among thousands of congeneric ligands, which requires only hundreds of TI calculations. Automated machine learning (AutoML) orchestrated by active learning (AL) in an AL-AutoML workflow allows unbiased and efficient search for a small set of best-performing molecules. We have applied this workflow to select inhibitors of the SARS-CoV-2 papain-like protease and were able to find 133 compounds with improved binding affinity, including 16 compounds with better than 100-fold binding affinity improvement. We obtained a hit rate that outperforms that expected of traditional expert medicinal chemist-guided campaigns. Thus, we demonstrate that the combination of AL and AutoML with free energy simulations provides at least 20× speedup relative to the naïve brute force approaches.


Asunto(s)
COVID-19 , Humanos , SARS-CoV-2/metabolismo , Diseño de Fármacos , Proteínas/química , Termodinámica , Simulación de Dinámica Molecular , Unión Proteica , Ligandos
17.
J Chem Inf Model ; 62(22): 5373-5382, 2022 11 28.
Artículo en Inglés | MEDLINE | ID: mdl-36112860

RESUMEN

Computational programs accelerate the chemical discovery processes but often need proper three-dimensional molecular information as part of the input. Getting optimal molecular structures is challenging because it requires enumerating and optimizing a huge space of stereoisomers and conformers. We developed the Python-based Auto3D package for generating the low-energy 3D structures using SMILES as the input. Auto3D is based on state-of-the-art algorithms and can automatize the isomer enumeration and duplicate filtering process, 3D building process, geometry optimization, and ranking process. Tested on 50 molecules with multiple unspecified stereocenters, Auto3D is guaranteed to find the stereoconfiguration that yields the lowest-energy conformer. With Auto3D, we provide an extension of the ANI model. The new model, dubbed ANI-2xt, is trained on a tautomer-rich data set. ANI-2xt is benchmarked with DFT methods on geometry optimization and electronic and Gibbs free energy calculations. Compared with ANI-2x, ANI-2xt provides a 42% error reduction for tautomeric reaction energy calculations when using the gold-standard coupled-cluster calculation as the reference. ANI-2xt can accurately predict the energies and is several orders of magnitude faster than DFT methods.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Estructura Molecular , Isomerismo , Benchmarking
18.
J Chem Inf Model ; 62(14): 3463-3475, 2022 07 25.
Artículo en Inglés | MEDLINE | ID: mdl-35797142

RESUMEN

Pyruvate dehydrogenase complex (PDC) deficiency is a major cause of primary lactic acidemia resulting in high morbidity and mortality, with limited therapeutic options. The E1 component of the mitochondrial multienzyme PDC (PDC-E1) is a symmetric dimer of heterodimers (αß/α'ß') encoded by the PDHA1 and PDHB genes, with two symmetric active sites each consisting of highly conserved phosphorylation loops A and B. PDHA1 mutations are responsible for 82-88% of cases. Greater than 85% of E1α residues with disease-causing missense mutations (DMMs) are solvent-inaccessible, with ∼30% among those involved in subunit-subunit interface contact (SSIC). We performed molecular dynamics simulations of wild-type (WT) PDC-E1 and E1 variants with E1α DMMs at R349 and W185 (residues involved in SSIC), to investigate their impact on human PDC-E1 structure. We evaluated the change in E1 structure and dynamics and examined their implications on E1 function with the specific DMMs. We found that the dynamics of phosphorylation Loop A, which is crucial for E1 biological activity, changes with DMMs that are at least about 15 Å away. Because communication is essential for PDC-E1 activity (with alternating active sites), we also investigated the possible communication network within WT PDC-E1 via centrality analysis. We observed that DMMs altered/disrupted the communication network of PDC-E1. Collectively, these results indicate allosteric effect in PDC-E1, with implications for the development of novel small-molecule therapeutics for specific recurrent E1α DMMs such as replacements of R349 responsible for ∼10% of PDC deficiency due to E1α DMMs.


Asunto(s)
Piruvato Deshidrogenasa (Lipoamida) , Enfermedad por Deficiencia del Complejo Piruvato Deshidrogenasa , Humanos , Mitocondrias , Mutación , Piruvato Deshidrogenasa (Lipoamida)/química , Piruvato Deshidrogenasa (Lipoamida)/genética , Complejo Piruvato Deshidrogenasa/química , Complejo Piruvato Deshidrogenasa/genética , Enfermedad por Deficiencia del Complejo Piruvato Deshidrogenasa/genética
19.
J Phys Chem Lett ; 13(15): 3479-3491, 2022 Apr 21.
Artículo en Inglés | MEDLINE | ID: mdl-35416675

RESUMEN

Enthalpies of formation and reaction are important thermodynamic properties that have a crucial impact on the outcome of chemical transformations. Here we implement the calculation of enthalpies of formation with a general-purpose ANI-1ccx neural network atomistic potential. We demonstrate on a wide range of benchmark sets that both ANI-1ccx and our other general-purpose data-driven method AIQM1 approach the coveted chemical accuracy of 1 kcal/mol with the speed of semiempirical quantum mechanical methods (AIQM1) or faster (ANI-1ccx). It is remarkably achieved without specifically training the machine learning parts of ANI-1ccx or AIQM1 on formation enthalpies. Importantly, we show that these data-driven methods provide statistical means for uncertainty quantification of their predictions, which we use to detect and eliminate outliers and revise reference experimental data. Uncertainty quantification may also help in the systematic improvement of such data-driven methods.

20.
Chem Sci ; 13(8): 2462-2474, 2022 Feb 23.
Artículo en Inglés | MEDLINE | ID: mdl-35310485

RESUMEN

The behavior of proteins is closely related to the protonation states of the residues. Therefore, prediction and measurement of pK a are essential to understand the basic functions of proteins. In this work, we develop a new empirical scheme for protein pK a prediction that is based on deep representation learning. It combines machine learning with atomic environment vector (AEV) and learned quantum mechanical representation from ANI-2x neural network potential (J. Chem. Theory Comput. 2020, 16, 4192). The scheme requires only the coordinate information of a protein as the input and separately estimates the pK a for all five titratable amino acid types. The accuracy of the approach was analyzed with both cross-validation and an external test set of proteins. Obtained results were compared with the widely used empirical approach PROPKA. The new empirical model provides accuracy with MAEs below 0.5 for all amino acid types. It surpasses the accuracy of PROPKA and performs significantly better than the null model. Our model is also sensitive to the local conformational changes and molecular interactions.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA