Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 68
Filtrar
1.
Annu Rev Phys Chem ; 75(1): 371-395, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38941524

RESUMO

In the past two decades, machine learning potentials (MLPs) have driven significant developments in chemical, biological, and material sciences. The construction and training of MLPs enable fast and accurate simulations and analysis of thermodynamic and kinetic properties. This review focuses on the application of MLPs to reaction systems with consideration of bond breaking and formation. We review the development of MLP models, primarily with neural network and kernel-based algorithms, and recent applications of reactive MLPs (RMLPs) to systems at different scales. We show how RMLPs are constructed, how they speed up the calculation of reactive dynamics, and how they facilitate the study of reaction trajectories, reaction rates, free energy calculations, and many other calculations. Different data sampling strategies applied in building RMLPs are also discussed with a focus on how to collect structures for rare events and how to further improve their performance with active learning.

2.
Nature ; 559(7715): 547-555, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-30046072

RESUMO

Here we summarize recent progress in machine learning for the chemical sciences. We outline machine-learning techniques that are suitable for addressing research questions in this domain, as well as future directions for the field. We envisage a future in which the design, synthesis, characterization and application of molecules and materials is accelerated by artificial intelligence.

3.
J Am Chem Soc ; 145(16): 8736-8750, 2023 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-37052978

RESUMO

Traditional computational approaches to design chemical species are limited by the need to compute properties for a vast number of candidates, e.g., by discriminative modeling. Therefore, inverse design methods aim to start from the desired property and optimize a corresponding chemical structure. From a machine learning viewpoint, the inverse design problem can be addressed through so-called generative modeling. Mathematically, discriminative models are defined by learning the probability distribution function of properties given the molecular or material structure. In contrast, a generative model seeks to exploit the joint probability of a chemical species with target characteristics. The overarching idea of generative modeling is to implement a system that produces novel compounds that are expected to have a desired set of chemical features, effectively sidestepping issues found in the forward design process. In this contribution, we overview and critically analyze popular generative algorithms like generative adversarial networks, variational autoencoders, flow, and diffusion models. We highlight key differences between each of the models, provide insights into recent success stories, and discuss outstanding challenges for realizing generative modeling discovered solutions in chemical applications.

4.
J Chem Inf Model ; 63(2): 583-594, 2023 01 23.
Artigo em Inglês | MEDLINE | ID: mdl-36599125

RESUMO

In silico identification of potent protein inhibitors commonly requires prediction of a ligand binding free energy (BFE). Thermodynamics integration (TI) based on molecular dynamics (MD) simulations is a BFE calculation method capable of acquiring accurate BFE, but it is computationally expensive and time-consuming. In this work, we have developed an efficient automated workflow for identifying compounds with the lowest BFE among thousands of congeneric ligands, which requires only hundreds of TI calculations. Automated machine learning (AutoML) orchestrated by active learning (AL) in an AL-AutoML workflow allows unbiased and efficient search for a small set of best-performing molecules. We have applied this workflow to select inhibitors of the SARS-CoV-2 papain-like protease and were able to find 133 compounds with improved binding affinity, including 16 compounds with better than 100-fold binding affinity improvement. We obtained a hit rate that outperforms that expected of traditional expert medicinal chemist-guided campaigns. Thus, we demonstrate that the combination of AL and AutoML with free energy simulations provides at least 20× speedup relative to the naïve brute force approaches.


Assuntos
COVID-19 , Humanos , SARS-CoV-2/metabolismo , Desenho de Fármacos , Proteínas/química , Termodinâmica , Simulação de Dinâmica Molecular , Ligação Proteica , Ligantes
5.
J Phys Chem A ; 127(11): 2417-2431, 2023 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-36802360

RESUMO

Advances in machine learned interatomic potentials (MLIPs), such as those using neural networks, have resulted in short-range models that can infer interaction energies with near ab initio accuracy and orders of magnitude reduced computational cost. For many atom systems, including macromolecules, biomolecules, and condensed matter, model accuracy can become reliant on the description of short- and long-range physical interactions. The latter terms can be difficult to incorporate into an MLIP framework. Recent research has produced numerous models with considerations for nonlocal electrostatic and dispersion interactions, leading to a large range of applications that can be addressed using MLIPs. In light of this, we present a Perspective focused on key methodologies and models being used where the presence of nonlocal physics and chemistry are crucial for describing system properties. The strategies covered include MLIPs augmented with dispersion corrections, electrostatics calculated with charges predicted from atomic environment descriptors, the use of self-consistency and message passing iterations to propagated nonlocal system information, and charges obtained via equilibration schemes. We aim to provide a pointed discussion to support the development of machine learning-based interatomic potentials for systems where contributions from only nearsighted terms are deficient.

6.
J Chem Phys ; 159(11)2023 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-37712780

RESUMO

Catalyzed by enormous success in the industrial sector, many research programs have been exploring data-driven, machine learning approaches. Performance can be poor when the model is extrapolated to new regions of chemical space, e.g., new bonding types, new many-body interactions. Another important limitation is the spatial locality assumption in model architecture, and this limitation cannot be overcome with larger or more diverse datasets. The outlined challenges are primarily associated with the lack of electronic structure information in surrogate models such as interatomic potentials. Given the fast development of machine learning and computational chemistry methods, we expect some limitations of surrogate models to be addressed in the near future; nevertheless spatial locality assumption will likely remain a limiting factor for their transferability. Here, we suggest focusing on an equally important effort-design of physics-informed models that leverage the domain knowledge and employ machine learning only as a corrective tool. In the context of material science, we will focus on semi-empirical quantum mechanics, using machine learning to predict corrections to the reduced-order Hamiltonian model parameters. The resulting models are broadly applicable, retain the speed of semiempirical chemistry, and frequently achieve accuracy on par with much more expensive ab initio calculations. These early results indicate that future work, in which machine learning and quantum chemistry methods are developed jointly, may provide the best of all worlds for chemistry applications that demand both high accuracy and high numerical efficiency.

7.
Acc Chem Res ; 54(7): 1575-1585, 2021 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-33715355

RESUMO

Machine learning interatomic potentials (MLIPs) are widely used for describing molecular energy and continue bridging the speed and accuracy gap between quantum mechanical (QM) and classical approaches like force fields. In this Account, we focus on the out-of-the-box approaches to developing transferable MLIPs for diverse chemical tasks. First, we introduce the "Accurate Neural Network engine for Molecular Energies," ANAKIN-ME, method (or ANI for short). The ANI model utilizes Justin Smith Symmetry Functions (JSSFs) and realizes training for vast data sets. The training data set of several orders of magnitude larger than before has become the key factor of the knowledge transferability and flexibility of MLIPs. As the quantity, quality, and types of interactions included in the training data set will dictate the accuracy of MLIPs, the task of proper data selection and model training could be assisted with advanced methods like active learning (AL), transfer learning (TL), and multitask learning (MTL).Next, we describe the AIMNet "Atoms-in-Molecules Network" that was inspired by the quantum theory of atoms in molecules. The AIMNet architecture lifts multiple limitations in MLIPs. It encodes long-range interactions and learnable representations of chemical elements. We also discuss the AIMNet-ME model that expands the applicability domain of AIMNet from neutral molecules toward open-shell systems. The AIMNet-ME encompasses a dependence of the potential on molecular charge and spin. It brings ML and physical models one step closer, ensuring the correct molecular energy behavior over the total molecular charge.We finally describe perhaps the simplest possible physics-aware model, which combines ML and the extended Hückel method. In ML-EHM, "Hierarchically Interacting Particle Neural Network," HIP-NN generates the set of a molecule- and environment-dependent Hamiltonian elements αµµ and K‡. As a test example, we show how in contrast to traditional Hückel theory, ML-EHM correctly describes orbital crossing with bond rotations. Hence it learns the underlying physics, highlighting that the inclusion of proper physical constraints and symmetries could significantly improve ML model generalization.

8.
J Chem Inf Model ; 62(14): 3463-3475, 2022 07 25.
Artigo em Inglês | MEDLINE | ID: mdl-35797142

RESUMO

Pyruvate dehydrogenase complex (PDC) deficiency is a major cause of primary lactic acidemia resulting in high morbidity and mortality, with limited therapeutic options. The E1 component of the mitochondrial multienzyme PDC (PDC-E1) is a symmetric dimer of heterodimers (αß/α'ß') encoded by the PDHA1 and PDHB genes, with two symmetric active sites each consisting of highly conserved phosphorylation loops A and B. PDHA1 mutations are responsible for 82-88% of cases. Greater than 85% of E1α residues with disease-causing missense mutations (DMMs) are solvent-inaccessible, with ∼30% among those involved in subunit-subunit interface contact (SSIC). We performed molecular dynamics simulations of wild-type (WT) PDC-E1 and E1 variants with E1α DMMs at R349 and W185 (residues involved in SSIC), to investigate their impact on human PDC-E1 structure. We evaluated the change in E1 structure and dynamics and examined their implications on E1 function with the specific DMMs. We found that the dynamics of phosphorylation Loop A, which is crucial for E1 biological activity, changes with DMMs that are at least about 15 Å away. Because communication is essential for PDC-E1 activity (with alternating active sites), we also investigated the possible communication network within WT PDC-E1 via centrality analysis. We observed that DMMs altered/disrupted the communication network of PDC-E1. Collectively, these results indicate allosteric effect in PDC-E1, with implications for the development of novel small-molecule therapeutics for specific recurrent E1α DMMs such as replacements of R349 responsible for ∼10% of PDC deficiency due to E1α DMMs.


Assuntos
Piruvato Desidrogenase (Lipoamida) , Doença da Deficiência do Complexo de Piruvato Desidrogenase , Humanos , Mitocôndrias , Mutação , Piruvato Desidrogenase (Lipoamida)/química , Piruvato Desidrogenase (Lipoamida)/genética , Complexo Piruvato Desidrogenase/química , Complexo Piruvato Desidrogenase/genética , Doença da Deficiência do Complexo de Piruvato Desidrogenase/genética
9.
J Chem Inf Model ; 62(22): 5373-5382, 2022 11 28.
Artigo em Inglês | MEDLINE | ID: mdl-36112860

RESUMO

Computational programs accelerate the chemical discovery processes but often need proper three-dimensional molecular information as part of the input. Getting optimal molecular structures is challenging because it requires enumerating and optimizing a huge space of stereoisomers and conformers. We developed the Python-based Auto3D package for generating the low-energy 3D structures using SMILES as the input. Auto3D is based on state-of-the-art algorithms and can automatize the isomer enumeration and duplicate filtering process, 3D building process, geometry optimization, and ranking process. Tested on 50 molecules with multiple unspecified stereocenters, Auto3D is guaranteed to find the stereoconfiguration that yields the lowest-energy conformer. With Auto3D, we provide an extension of the ANI model. The new model, dubbed ANI-2xt, is trained on a tautomer-rich data set. ANI-2xt is benchmarked with DFT methods on geometry optimization and electronic and Gibbs free energy calculations. Compared with ANI-2x, ANI-2xt provides a 42% error reduction for tautomeric reaction energy calculations when using the gold-standard coupled-cluster calculation as the reference. ANI-2xt can accurately predict the energies and is several orders of magnitude faster than DFT methods.


Assuntos
Algoritmos , Redes Neurais de Computação , Estrutura Molecular , Isomerismo , Benchmarking
10.
Chem Soc Rev ; 50(16): 9121-9151, 2021 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-34212944

RESUMO

COVID-19 has resulted in huge numbers of infections and deaths worldwide and brought the most severe disruptions to societies and economies since the Great Depression. Massive experimental and computational research effort to understand and characterize the disease and rapidly develop diagnostics, vaccines, and drugs has emerged in response to this devastating pandemic and more than 130 000 COVID-19-related research papers have been published in peer-reviewed journals or deposited in preprint servers. Much of the research effort has focused on the discovery of novel drug candidates or repurposing of existing drugs against COVID-19, and many such projects have been either exclusively computational or computer-aided experimental studies. Herein, we provide an expert overview of the key computational methods and their applications for the discovery of COVID-19 small-molecule therapeutics that have been reported in the research literature. We further outline that, after the first year the COVID-19 pandemic, it appears that drug repurposing has not produced rapid and global solutions. However, several known drugs have been used in the clinic to cure COVID-19 patients, and a few repurposed drugs continue to be considered in clinical trials, along with several novel clinical candidates. We posit that truly impactful computational tools must deliver actionable, experimentally testable hypotheses enabling the discovery of novel drugs and drug combinations, and that open science and rapid sharing of research results are critical to accelerate the development of novel, much needed therapeutics for COVID-19.


Assuntos
Tratamento Farmacológico da COVID-19 , Simulação por Computador , Desenho de Fármacos , Descoberta de Drogas/métodos , Reposicionamento de Medicamentos , Antivirais/uso terapêutico , COVID-19/virologia , Ensaios Clínicos como Assunto , Humanos , Pandemias , SARS-CoV-2/efeitos dos fármacos
11.
J Am Chem Soc ; 143(42): 17677-17689, 2021 10 27.
Artigo em Inglês | MEDLINE | ID: mdl-34637304

RESUMO

Modern polymer science suffers from the curse of multidimensionality. The large chemical space imposed by including combinations of monomers into a statistical copolymer overwhelms polymer synthesis and characterization technology and limits the ability to systematically study structure-property relationships. To tackle this challenge in the context of 19F magnetic resonance imaging (MRI) agents, we pursued a computer-guided materials discovery approach that combines synergistic innovations in automated flow synthesis and machine learning (ML) method development. A software-controlled, continuous polymer synthesis platform was developed to enable iterative experimental-computational cycles that resulted in the synthesis of 397 unique copolymer compositions within a six-variable compositional space. The nonintuitive design criteria identified by ML, which were accomplished by exploring <0.9% of the overall compositional space, lead to the identification of >10 copolymer compositions that outperformed state-of-the-art materials.


Assuntos
Meios de Contraste/química , Polímeros/química , Meios de Contraste/síntese química , Flúor/química , Aprendizado de Máquina , Imageamento por Ressonância Magnética , Polímeros/síntese química , Software , Solubilidade
12.
J Chem Inf Model ; 61(1): 7-13, 2021 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-33393291

RESUMO

Deep learning models have demonstrated outstanding results in many data-rich areas of research, such as computer vision and natural language processing. Currently, there is a rise of deep learning in computational chemistry and materials informatics, where deep learning could be effectively applied in modeling the relationship between chemical structures and their properties. With the immense growth of chemical and materials data, deep learning models can begin to outperform conventional machine learning techniques such as random forest, support vector machines, and nearest neighbor. Herein, we introduce OpenChem, a PyTorch-based deep learning toolkit for computational chemistry and drug design. OpenChem offers easy and fast model development, modular software design, and several data preprocessing modules. It is freely available via the GitHub repository.


Assuntos
Aprendizado Profundo , Química Computacional , Desenho de Fármacos , Aprendizado de Máquina , Máquina de Vetores de Suporte
13.
J Chem Phys ; 154(24): 244108, 2021 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-34241371

RESUMO

The Hückel Hamiltonian is an incredibly simple tight-binding model known for its ability to capture qualitative physics phenomena arising from electron interactions in molecules and materials. Part of its simplicity arises from using only two types of empirically fit physics-motivated parameters: the first describes the orbital energies on each atom and the second describes electronic interactions and bonding between atoms. By replacing these empirical parameters with machine-learned dynamic values, we vastly increase the accuracy of the extended Hückel model. The dynamic values are generated with a deep neural network, which is trained to reproduce orbital energies and densities derived from density functional theory. The resulting model retains interpretability, while the deep neural network parameterization is smooth and accurate and reproduces insightful features of the original empirical parameterization. Overall, this work shows the promise of utilizing machine learning to formulate simple, accurate, and dynamically parameterized physics models.

14.
Chem Soc Rev ; 49(11): 3525-3564, 2020 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-32356548

RESUMO

Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.


Assuntos
Química Farmacêutica/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/metabolismo , Preparações Farmacêuticas/química , Algoritmos , Animais , Inteligência Artificial , Bases de Dados Factuais , Desenho de Fármacos , História do Século XX , História do Século XXI , Humanos , Modelos Moleculares , Relação Quantitativa Estrutura-Atividade , Teoria Quântica , Reprodutibilidade dos Testes
16.
Bioinformatics ; 35(19): 3584-3591, 2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-30785185

RESUMO

MOTIVATION: Non-ribosomal peptide synthetases (NRPSs) are modular enzymatic machines that catalyze the ribosome-independent production of structurally complex small peptides, many of which have important clinical applications as antibiotics, antifungals and anti-cancer agents. Several groups have tried to expand natural product diversity by intermixing different NRPS modules to create synthetic peptides. This approach has not been as successful as anticipated, suggesting that these modules are not fully interchangeable. RESULTS: We explored whether Inter-Modular Linkers (IMLs) impact the ability of NRPS modules to communicate during the synthesis of NRPs. We developed a parser to extract 39 804 IMLs from both well annotated and putative NRPS biosynthetic gene clusters from 39 232 bacterial genomes and established the first IMLs database. We analyzed these IMLs and identified a striking relationship between IMLs and the amino acid substrates of their adjacent modules. More than 92% of the identified IMLs connect modules that activate a particular pair of substrates, suggesting that significant specificity is embedded within these sequences. We therefore propose that incorporating the correct IML is critical when attempting combinatorial biosynthesis of novel NRPS. AVAILABILITY AND IMPLEMENTATION: The IMLs database as well as the NRPS-Parser have been made available on the web at https://nrps-linker.unc.edu. The entire source code of the project is hosted in GitHub repository (https://github.com/SWFarag/nrps-linker). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Ribossomos , Antibacterianos , Produtos Biológicos , Peptídeo Sintases , Peptídeos
17.
J Chem Inf Model ; 60(7): 3408-3415, 2020 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-32568524

RESUMO

This paper presents TorchANI, a PyTorch-based program for training/inference of ANI (ANAKIN-ME) deep learning models to obtain potential energy surfaces and other physical properties of molecular systems. ANI is an accurate neural network potential originally implemented using C++/CUDA in a program called NeuroChem. Compared with NeuroChem, TorchANI has a design emphasis on being lightweight, user friendly, cross platform, and easy to read and modify for fast prototyping, while allowing acceptable sacrifice on running performance. Because the computation of atomic environmental vectors and atomic neural networks are all implemented using PyTorch operators, TorchANI is able to use PyTorch's autograd engine to automatically compute analytical forces and Hessian matrices, as well as do force training without requiring any additional codes. TorchANI is open-source and freely available on GitHub: https://github.com/aiqm/torchani.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação
18.
Phys Chem Chem Phys ; 22(45): 26478-26486, 2020 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-33185200

RESUMO

Machine learning solved many challenging problems in computer-assisted synthesis prediction (CASP). We formulate a reaction prediction problem in terms of node-classification in a disconnected graph of source molecules and generalize a graph convolution neural network for disconnected graphs. Here we demonstrate that our approach can successfully predict centres of reaction and atoms of the main product. A set of experiments using the USPTO dataset demonstrates excellent performance and interpretability of the proposed model. Implicitly learned latent vector representation of chemical reactions strongly correlates with the class of the chemical reaction. Reactions with similar templates group together in the latent vector space.

19.
Nature ; 571(7763): 42-43, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31270488
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA