Pesquisa | Portal Regional da BVS

1.

Development of scalable and generalizable machine learned force field for polymers.

Mohanty, Shaswat; Stevenson, James; Browning, Andrea R; Jacobson, Leif; Leswing, Karl; Halls, Mathew D; Afzal, Mohammad Atif Faiz.

Sci Rep ; 13(1): 17251, 2023 Oct 11.

Artigo em Inglês | MEDLINE | ID: mdl-37821501

RESUMO

Understanding and predicting the properties of polymers is vital to developing tailored polymer molecules for desired applications. Classical force fields may fail to capture key properties, for example, the transport properties of certain polymer systems such as polyethylene glycol. As a solution, we present an alternative potential energy surface, a charge recursive neural network (QRNN) model trained on DFT calculations made on smaller atomic clusters that generalizes well to oligomers comprising larger atomic clusters or longer chains. We demonstrate the validity of the polymer QRNN workflow by modeling the oligomers of ethylene glycol. We apply two rounds of active learning (addition of new training clusters based on current model performance) and implement a novel model training approach that uses partial charges from a semi-empirical method. Our developed QRNN model for polymers produces stable molecular dynamics (MD) simulation trajectory and captures the dynamics of polymer chains as indicated by the striking agreement with experimental values. Our model allows working on much larger systems than allowed by DFT simulations, at the same time providing a more accurate force field than classical force fields which provides a promising avenue for large-scale molecular simulations of polymeric systems.

2.

FEP Protocol Builder: Optimization of Free Energy Perturbation Protocols Using Active Learning.

de Oliveira, César; Leswing, Karl; Feng, Shulu; Kanters, René; Abel, Robert; Bhat, Sathesh.

J Chem Inf Model ; 63(17): 5592-5603, 2023 09 11.

Artigo em Inglês | MEDLINE | ID: mdl-37594480

RESUMO

Significant improvements have been made in the past decade to methods that rapidly and accurately predict binding affinity through free energy perturbation (FEP) calculations. This has been driven by recent advances in small-molecule force fields and sampling algorithms combined with the availability of low-cost parallel computing. Predictive accuracies of â¼1 kcal mol-1 have been regularly achieved, which are sufficient to drive potency optimization in modern drug discovery campaigns. Despite the robustness of these FEP approaches across multiple target classes, there are invariably target systems that do not display expected performance with default FEP settings. Traditionally, these systems required labor-intensive manual protocol development to arrive at parameter settings that produce a predictive FEP model. Due to the (a) relatively large parameter space to be explored, (b) significant compute requirements, and (c) limited understanding of how combinations of parameters can affect FEP performance, manual FEP protocol optimization can take weeks to months to complete, and often does not involve rigorous train-test set splits, resulting in potential overfitting. These manual FEP protocol development timelines do not coincide with tight drug discovery project timelines, essentially preventing the use of FEP calculations for these target systems. Here, we describe an automated workflow termed FEP Protocol Builder (FEP-PB) to rapidly generate accurate FEP protocols for systems that do not perform well with default settings. FEP-PB uses an active-learning workflow to iteratively search the protocol parameter space to develop accurate FEP protocols. To validate this approach, we applied it to pharmaceutically relevant systems where default FEP settings could not produce predictive models. We demonstrate that FEP-PB can rapidly generate accurate FEP protocols for the previously challenging MCL1 system with limited human intervention. We also apply FEP-PB in a real-world drug discovery setting to generate an accurate FEP protocol for the p97 system. FEP-PB is able to generate a more accurate protocol than the expert user, rapidly validating p97 as amenable to free energy calculations. Additionally, through the active-learning workflow, we are able to gain insight into which parameters are most important for a given system. These results suggest that FEP-PB is a robust tool that can aid in rapidly developing accurate FEP protocols and increasing the number of targets that are amenable to the technology.

Assuntos

Algoritmos , Protocolos de Quimioterapia Combinada Antineoplásica , Humanos , Cisplatino , Descoberta de Drogas

3.

Epik: pK_a and Protonation State Prediction through Machine Learning.

Johnston, Ryne C; Yao, Kun; Kaplan, Zachary; Chelliah, Monica; Leswing, Karl; Seekins, Sean; Watts, Shawn; Calkins, David; Chief Elk, Jackson; Jerome, Steven V; Repasky, Matthew P; Shelley, John C.

J Chem Theory Comput ; 19(8): 2380-2388, 2023 Apr 25.

Artigo em Inglês | MEDLINE | ID: mdl-37023332

RESUMO

Epik version 7 is a software program that uses machine learning for predicting the pKa values and protonation state distribution of complex, druglike molecules. Using an ensemble of atomic graph convolutional neural networks (GCNNs) trained on over 42,000 pKa values across broad chemical space from both experimental and computed origins, the model predicts pKa values with 0.42 and 0.72 pKa unit median absolute and root mean square errors, respectively, across seven test sets. Epik version 7 also generates protonation states and recovers 95% of the most populated protonation states compared to previous versions. Requiring on average only 47 ms per ligand, Epik version 7 is rapid and accurate enough to evaluate protonation states for crucial molecules and prepare ultra-large libraries of compounds to explore vast regions of chemical space. The simplicity and time required for the training allow for the generation of highly accurate models customized to a program's specific chemistry.

4.

De Novo Design of Molecules with Low Hole Reorganization Energy Based on a Quarter-Million Molecule DFT Screen: Part 2.

Staker, Joshua; Marshall, Kyle; Leswing, Karl; Robertson, Tim; Halls, Mathew D; Goldberg, Alexander; Morisato, Tsuguo; Maeshima, Hiroyuki; Ando, Tatsuhito; Arai, Hideyuki; Sasago, Masaru; Fujii, Eiji; Matsuzawa, Nobuyuki N.

J Phys Chem A ; 126(34): 5837-5852, 2022 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-35984470

RESUMO

Organic semiconductors have many desirable properties including improved manufacturing and flexible mechanical properties. Due to the vastness of chemical space, it is essential to efficiently explore chemical space when designing new materials, including through the use of generative techniques. New generative machine learning methods for molecular design continue to be published in the literature at a significant rate but successfully adapting methods to new chemistry and problem domains remains difficult. These challenges necessitate continual method evaluation to probe method viability for use in alternative applications not covered in the original works. In continuation of our previous work, we evaluate four additional machine-learning-based de novo methods for generating molecules with high predicted hole mobility for use in semiconductor applications. The four generative methods evaluated here are (1) Molecule Deep Q-Networks (MolDQN), which utilizes Deep-Q learning to directly optimize molecular structure graphs for desired properties instead of generating SMILES, (2) Graph-based Genetic Algorithm (GraphGA), which uses a genetic algorithm for optimization where crossovers and mutations are defined in terms of RDKit's reaction SMILES, (3) Generative Tensorial Reinforcement Learning (GENTRL), which is a variational autoencoder (VAE) with a learned prior distribution and optimized using reinforcement learning, and (4) Monte Carlo tree search exploration of chemical space in conjunction with a recurrent neural network (RNN) decoder (ChemTS). The generated molecules were evaluated using density functional theory (DFT) and we discovered better performing molecules with the GraphGA method compared to the other approaches.

5.

High-Dimensional Neural Network Potential for Liquid Electrolyte Simulations.

Dajnowicz, Steven; Agarwal, Garvit; Stevenson, James M; Jacobson, Leif D; Ramezanghorbani, Farhad; Leswing, Karl; Friesner, Richard A; Halls, Mathew D; Abel, Robert.

J Phys Chem B ; 126(33): 6271-6280, 2022 08 25.

Artigo em Inglês | MEDLINE | ID: mdl-35972463

RESUMO

Liquid electrolytes are one of the most important components of Li-ion batteries, which are a critical technology of the modern world. However, we still lack the computational tools required to accurately calculate key properties of these materials (viscosity and ionic diffusivity) from first principles necessary to support improved designs. In this work, we report a machine learning-based force field for liquid electrolyte simulations, which bridges the gap between the accuracy of range-separated hybrid density functional theory and the efficiency of classical force fields. Predictions of material properties made with this force field are quantitatively accurate compared to experimental data. Our model uses the QRNN deep neural network architecture, which includes both long-range interactions and global charge equilibration. The training data set is composed solely of non-periodic density functional theory (DFT), allowing the practical use of an accurate theory (here, ωB97X-D3BJ/def2-TZVPD), which would be prohibitively expensive for generating large data sets with periodic DFT. In this report, we focus on seven common carbonates and LiPF6, but this methodology has very few assumptions and can be readily applied to any liquid electrolyte system. This provides a promising path forward for large-scale atomistic modeling of many important battery chemistries.

Assuntos

Lítio , Simulação de Dinâmica Molecular , Fontes de Energia Elétrica , Eletrólitos , Redes Neurais de Computação

6.

Transferable Neural Network Potential Energy Surfaces for Closed-Shell Organic Molecules: Extension to Ions.

Jacobson, Leif D; Stevenson, James M; Ramezanghorbani, Farhad; Ghoreishi, Delaram; Leswing, Karl; Harder, Edward D; Abel, Robert.

J Chem Theory Comput ; 18(4): 2354-2366, 2022 Apr 12.

Artigo em Inglês | MEDLINE | ID: mdl-35290063

RESUMO

Transferable high dimensional neural network potentials (HDNNPs) have shown great promise as an avenue to increase the accuracy and domain of applicability of existing atomistic force fields for organic systems relevant to life science. We have previously reported such a potential (Schrödinger-ANI) that has broad coverage of druglike molecules. We extend that work here to cover ionic and zwitterionic druglike molecules expected to be relevant to drug discovery research activities. We report a novel HDNNP architecture, which we call QRNN, that predicts atomic charges and uses these charges as descriptors in an energy model that delivers conformational energies within chemical accuracy when measured against the reference theory it is trained to. Further, we find that delta learning based on a semiempirical level of theory approximately halves the errors. We test the models on torsion energy profiles, relative conformational energies, geometric parameters, and relative tautomer errors.

Assuntos

Redes Neurais de Computação , Íons , Conformação Molecular

7.

Efficient Exploration of Chemical Space with Docking and Deep Learning.

Yang, Ying; Yao, Kun; Repasky, Matthew P; Leswing, Karl; Abel, Robert; Shoichet, Brian K; Jerome, Steven V.

J Chem Theory Comput ; 17(11): 7106-7119, 2021 Nov 09.

Artigo em Inglês | MEDLINE | ID: mdl-34592101

RESUMO

With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling the discovery of more-potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures almost all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blind test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds.

8.

De Novo Design of Molecules with Low Hole Reorganization Energy Based on a Quarter-Million Molecule DFT Screen.

Marques, Gabriel; Leswing, Karl; Robertson, Tim; Giesen, David; Halls, Mathew D; Goldberg, Alexander; Marshall, Kyle; Staker, Joshua; Morisato, Tsuguo; Maeshima, Hiroyuki; Arai, Hideyuki; Sasago, Masaru; Fujii, Eiji; Matsuzawa, Nobuyuki N.

J Phys Chem A ; 125(33): 7331-7343, 2021 Aug 26.

Artigo em Inglês | MEDLINE | ID: mdl-34342466

RESUMO

Materials exhibiting higher mobilities than conventional organic semiconducting materials such as fullerenes and fused thiophenes are in high demand for applications in printed electronics. To discover new molecules in the heteroacene family that might show improved hole mobility, three de novo design methods were applied. Machine learning (ML) models were generated based on previously calculated hole reorganization energies of a quarter million examples of heteroacenes, where the energies were calculated by applying density functional theory (DFT) and a massive cloud computing environment. The three generative methods applied were (1) the continuous space method, where molecular structures are converted into continuous variables by applying the variational autoencoder/decoder technique; (2) the method based on reinforcement learning of SMILES strings (the REINVENT method); and (3) the junction tree variational autoencoder method that directly generates molecular graphs. Among the three methods, the second and third methods succeeded in obtaining chemical structures whose DFT-calculated hole reorganization energy was lower than the lowest energy in the training dataset. This suggests that an extrapolative materials design protocol can be developed by applying generative modeling to a quantitative structure-property relationship (QSPR) utility function.

9.

Design of Organic Electronic Materials With a Goal-Directed Generative Model Powered by Deep Neural Networks and High-Throughput Molecular Simulations.

Kwak, H Shaun; An, Yuling; Giesen, David J; Hughes, Thomas F; Brown, Christopher T; Leswing, Karl; Abroshan, Hadi; Halls, Mathew D.

Front Chem ; 9: 800370, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-35111730

RESUMO

In recent years, generative machine learning approaches have attracted significant attention as an enabling approach for designing novel molecular materials with minimal design bias and thereby realizing more directed design for a specific materials property space. Further, data-driven approaches have emerged as a new tool to accelerate the development of novel organic electronic materials for organic light-emitting diode (OLED) applications. We demonstrate and validate a goal-directed generative machine learning framework based on a recurrent neural network (RNN) deep reinforcement learning approach for the design of hole transporting OLED materials. These large-scale molecular simulations also demonstrate a rapid, cost-effective method to identify new materials in OLEDs while also enabling expansion into many other verticals such as catalyst design, aerospace, life science, and petrochemicals.

10.

Combining Cloud-Based Free-Energy Calculations, Synthetically Aware Enumerations, and Goal-Directed Generative Machine Learning for Rapid Large-Scale Chemical Exploration and Optimization.

Ghanakota, Phani; Bos, Pieter H; Konze, Kyle D; Staker, Joshua; Marques, Gabriel; Marshall, Kyle; Leswing, Karl; Abel, Robert; Bhat, Sathesh.

J Chem Inf Model ; 60(9): 4311-4325, 2020 09 28.

Artigo em Inglês | MEDLINE | ID: mdl-32484669

RESUMO

The hit identification process usually involves the profiling of millions to more recently billions of compounds either via traditional experimental high-throughput screens (HTS) or computational virtual high-throughput screens (vHTS). We have previously demonstrated that, by coupling reaction-based enumeration, active learning, and free energy calculations, a similarly large-scale exploration of chemical space can be extended to the hit-to-lead process. In this work, we augment that approach by coupling large scale enumeration and cloud-based free energy perturbation (FEP) profiling with goal-directed generative machine learning, which results in a higher enrichment of potent ideas compared to large scale enumeration alone, while simultaneously staying within the bounds of predefined drug-like property space. We can achieve this by building the molecular distribution for generative machine learning from the PathFinder rules-based enumeration and optimizing for a weighted sum QSAR-based multiparameter optimization function. We examine the utility of this combined approach by designing potent inhibitors of cyclin-dependent kinase 2 (CDK2) and demonstrate a coupled workflow that can (1) provide a 6.4-fold enrichment improvement in identifying <10 nM compounds over random selection and a 1.5-fold enrichment in identifying <10 nM compounds over our previous method, (2) rapidly explore relevant chemical space outside the bounds of commercial reagents, (3) use generative ML approaches to "learn" the SAR from large scale in silico enumerations and generate novel idea molecules for a flexible receptor site that are both potent and within relevant physicochemical space, and (4) produce over 3â¯000â¯000 idea molecules and run 1935 FEP simulations, identifying 69 ideas with a predicted IC50 < 10 nM and 358 ideas with a predicted IC50 < 100 nM. The reported data suggest combining both reaction-based and generative machine learning for ideation results in a higher enrichment of potent compounds over previously described approaches and has the potential to rapidly accelerate the discovery of novel chemical matter within a predefined potency and property space.

Assuntos

Descoberta de Drogas , Preparações Farmacêuticas , Simulação por Computador , Objetivos , Aprendizado de Máquina

11.

Reaction-Based Enumeration, Active Learning, and Free Energy Calculations To Rapidly Explore Synthetically Tractable Chemical Space and Optimize Potency of Cyclin-Dependent Kinase 2 Inhibitors.

Konze, Kyle D; Bos, Pieter H; Dahlgren, Markus K; Leswing, Karl; Tubert-Brohman, Ivan; Bortolato, Andrea; Robbason, Braxton; Abel, Robert; Bhat, Sathesh.

J Chem Inf Model ; 59(9): 3782-3793, 2019 09 23.

Artigo em Inglês | MEDLINE | ID: mdl-31404495

RESUMO

The hit-to-lead and lead optimization processes usually involve the design, synthesis, and profiling of thousands of analogs prior to clinical candidate nomination. A hit finding campaign may begin with a virtual screen that explores millions of compounds, if not more. However, this scale of computational profiling is not frequently performed in the hit-to-lead or lead optimization phases of drug discovery. This is likely due to the lack of appropriate computational tools to generate synthetically tractable lead-like compounds in silico, and a lack of computational methods to accurately profile compounds prospectively on a large scale. Recent advances in computational power and methods provide the ability to profile much larger libraries of ligands than previously possible. Herein, we report a new computational technique, referred to as "PathFinder", that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. In this work, the integration of PathFinder-driven compound generation, cloud-based FEP simulations, and active learning are used to rapidly optimize R-groups, and generate new cores for inhibitors of cyclin-dependent kinase 2 (CDK2). Using this approach, we explored >300â¯000 ideas, performed >5000 FEP simulations, and identified >100 ligands with a predicted IC50 < 100 nM, including four unique cores. To our knowledge, this is the largest set of FEP calculations disclosed in the literature to date. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.

Assuntos

Quinase 2 Dependente de Ciclina/antagonistas & inibidores , Descoberta de Drogas , Aprendizado de Máquina , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/farmacologia , Quinase 2 Dependente de Ciclina/metabolismo , Desenho de Fármacos , Descoberta de Drogas/métodos , Humanos , Modelos Moleculares , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia , Termodinâmica

12.

MoleculeNet: a benchmark for molecular machine learning.

Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S; Leswing, Karl; Pande, Vijay.

Chem Sci ; 9(2): 513-530, 2018 Jan 14.

Artigo em Inglês | MEDLINE | ID: mdl-29629118

RESUMO

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA