Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Sci Rep ; 13(1): 17251, 2023 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-37821501

RESUMEN

Understanding and predicting the properties of polymers is vital to developing tailored polymer molecules for desired applications. Classical force fields may fail to capture key properties, for example, the transport properties of certain polymer systems such as polyethylene glycol. As a solution, we present an alternative potential energy surface, a charge recursive neural network (QRNN) model trained on DFT calculations made on smaller atomic clusters that generalizes well to oligomers comprising larger atomic clusters or longer chains. We demonstrate the validity of the polymer QRNN workflow by modeling the oligomers of ethylene glycol. We apply two rounds of active learning (addition of new training clusters based on current model performance) and implement a novel model training approach that uses partial charges from a semi-empirical method. Our developed QRNN model for polymers produces stable molecular dynamics (MD) simulation trajectory and captures the dynamics of polymer chains as indicated by the striking agreement with experimental values. Our model allows working on much larger systems than allowed by DFT simulations, at the same time providing a more accurate force field than classical force fields which provides a promising avenue for large-scale molecular simulations of polymeric systems.

2.
J Chem Inf Model ; 63(17): 5592-5603, 2023 09 11.
Artículo en Inglés | MEDLINE | ID: mdl-37594480

RESUMEN

Significant improvements have been made in the past decade to methods that rapidly and accurately predict binding affinity through free energy perturbation (FEP) calculations. This has been driven by recent advances in small-molecule force fields and sampling algorithms combined with the availability of low-cost parallel computing. Predictive accuracies of ∼1 kcal mol-1 have been regularly achieved, which are sufficient to drive potency optimization in modern drug discovery campaigns. Despite the robustness of these FEP approaches across multiple target classes, there are invariably target systems that do not display expected performance with default FEP settings. Traditionally, these systems required labor-intensive manual protocol development to arrive at parameter settings that produce a predictive FEP model. Due to the (a) relatively large parameter space to be explored, (b) significant compute requirements, and (c) limited understanding of how combinations of parameters can affect FEP performance, manual FEP protocol optimization can take weeks to months to complete, and often does not involve rigorous train-test set splits, resulting in potential overfitting. These manual FEP protocol development timelines do not coincide with tight drug discovery project timelines, essentially preventing the use of FEP calculations for these target systems. Here, we describe an automated workflow termed FEP Protocol Builder (FEP-PB) to rapidly generate accurate FEP protocols for systems that do not perform well with default settings. FEP-PB uses an active-learning workflow to iteratively search the protocol parameter space to develop accurate FEP protocols. To validate this approach, we applied it to pharmaceutically relevant systems where default FEP settings could not produce predictive models. We demonstrate that FEP-PB can rapidly generate accurate FEP protocols for the previously challenging MCL1 system with limited human intervention. We also apply FEP-PB in a real-world drug discovery setting to generate an accurate FEP protocol for the p97 system. FEP-PB is able to generate a more accurate protocol than the expert user, rapidly validating p97 as amenable to free energy calculations. Additionally, through the active-learning workflow, we are able to gain insight into which parameters are most important for a given system. These results suggest that FEP-PB is a robust tool that can aid in rapidly developing accurate FEP protocols and increasing the number of targets that are amenable to the technology.


Asunto(s)
Algoritmos , Protocolos de Quimioterapia Combinada Antineoplásica , Humanos , Cisplatino , Descubrimiento de Drogas
3.
J Chem Theory Comput ; 19(8): 2380-2388, 2023 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-37023332

RESUMEN

Epik version 7 is a software program that uses machine learning for predicting the pKa values and protonation state distribution of complex, druglike molecules. Using an ensemble of atomic graph convolutional neural networks (GCNNs) trained on over 42,000 pKa values across broad chemical space from both experimental and computed origins, the model predicts pKa values with 0.42 and 0.72 pKa unit median absolute and root mean square errors, respectively, across seven test sets. Epik version 7 also generates protonation states and recovers 95% of the most populated protonation states compared to previous versions. Requiring on average only 47 ms per ligand, Epik version 7 is rapid and accurate enough to evaluate protonation states for crucial molecules and prepare ultra-large libraries of compounds to explore vast regions of chemical space. The simplicity and time required for the training allow for the generation of highly accurate models customized to a program's specific chemistry.

4.
J Phys Chem A ; 126(34): 5837-5852, 2022 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-35984470

RESUMEN

Organic semiconductors have many desirable properties including improved manufacturing and flexible mechanical properties. Due to the vastness of chemical space, it is essential to efficiently explore chemical space when designing new materials, including through the use of generative techniques. New generative machine learning methods for molecular design continue to be published in the literature at a significant rate but successfully adapting methods to new chemistry and problem domains remains difficult. These challenges necessitate continual method evaluation to probe method viability for use in alternative applications not covered in the original works. In continuation of our previous work, we evaluate four additional machine-learning-based de novo methods for generating molecules with high predicted hole mobility for use in semiconductor applications. The four generative methods evaluated here are (1) Molecule Deep Q-Networks (MolDQN), which utilizes Deep-Q learning to directly optimize molecular structure graphs for desired properties instead of generating SMILES, (2) Graph-based Genetic Algorithm (GraphGA), which uses a genetic algorithm for optimization where crossovers and mutations are defined in terms of RDKit's reaction SMILES, (3) Generative Tensorial Reinforcement Learning (GENTRL), which is a variational autoencoder (VAE) with a learned prior distribution and optimized using reinforcement learning, and (4) Monte Carlo tree search exploration of chemical space in conjunction with a recurrent neural network (RNN) decoder (ChemTS). The generated molecules were evaluated using density functional theory (DFT) and we discovered better performing molecules with the GraphGA method compared to the other approaches.

5.
J Phys Chem B ; 126(33): 6271-6280, 2022 08 25.
Artículo en Inglés | MEDLINE | ID: mdl-35972463

RESUMEN

Liquid electrolytes are one of the most important components of Li-ion batteries, which are a critical technology of the modern world. However, we still lack the computational tools required to accurately calculate key properties of these materials (viscosity and ionic diffusivity) from first principles necessary to support improved designs. In this work, we report a machine learning-based force field for liquid electrolyte simulations, which bridges the gap between the accuracy of range-separated hybrid density functional theory and the efficiency of classical force fields. Predictions of material properties made with this force field are quantitatively accurate compared to experimental data. Our model uses the QRNN deep neural network architecture, which includes both long-range interactions and global charge equilibration. The training data set is composed solely of non-periodic density functional theory (DFT), allowing the practical use of an accurate theory (here, ωB97X-D3BJ/def2-TZVPD), which would be prohibitively expensive for generating large data sets with periodic DFT. In this report, we focus on seven common carbonates and LiPF6, but this methodology has very few assumptions and can be readily applied to any liquid electrolyte system. This provides a promising path forward for large-scale atomistic modeling of many important battery chemistries.


Asunto(s)
Litio , Simulación de Dinámica Molecular , Suministros de Energía Eléctrica , Electrólitos , Redes Neurales de la Computación
6.
J Chem Theory Comput ; 18(4): 2354-2366, 2022 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-35290063

RESUMEN

Transferable high dimensional neural network potentials (HDNNPs) have shown great promise as an avenue to increase the accuracy and domain of applicability of existing atomistic force fields for organic systems relevant to life science. We have previously reported such a potential (Schrödinger-ANI) that has broad coverage of druglike molecules. We extend that work here to cover ionic and zwitterionic druglike molecules expected to be relevant to drug discovery research activities. We report a novel HDNNP architecture, which we call QRNN, that predicts atomic charges and uses these charges as descriptors in an energy model that delivers conformational energies within chemical accuracy when measured against the reference theory it is trained to. Further, we find that delta learning based on a semiempirical level of theory approximately halves the errors. We test the models on torsion energy profiles, relative conformational energies, geometric parameters, and relative tautomer errors.


Asunto(s)
Redes Neurales de la Computación , Iones , Conformación Molecular
7.
J Chem Theory Comput ; 17(11): 7106-7119, 2021 Nov 09.
Artículo en Inglés | MEDLINE | ID: mdl-34592101

RESUMEN

With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling the discovery of more-potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures almost all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blind test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds.

8.
J Phys Chem A ; 125(33): 7331-7343, 2021 Aug 26.
Artículo en Inglés | MEDLINE | ID: mdl-34342466

RESUMEN

Materials exhibiting higher mobilities than conventional organic semiconducting materials such as fullerenes and fused thiophenes are in high demand for applications in printed electronics. To discover new molecules in the heteroacene family that might show improved hole mobility, three de novo design methods were applied. Machine learning (ML) models were generated based on previously calculated hole reorganization energies of a quarter million examples of heteroacenes, where the energies were calculated by applying density functional theory (DFT) and a massive cloud computing environment. The three generative methods applied were (1) the continuous space method, where molecular structures are converted into continuous variables by applying the variational autoencoder/decoder technique; (2) the method based on reinforcement learning of SMILES strings (the REINVENT method); and (3) the junction tree variational autoencoder method that directly generates molecular graphs. Among the three methods, the second and third methods succeeded in obtaining chemical structures whose DFT-calculated hole reorganization energy was lower than the lowest energy in the training dataset. This suggests that an extrapolative materials design protocol can be developed by applying generative modeling to a quantitative structure-property relationship (QSPR) utility function.

9.
Front Chem ; 9: 800370, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35111730

RESUMEN

In recent years, generative machine learning approaches have attracted significant attention as an enabling approach for designing novel molecular materials with minimal design bias and thereby realizing more directed design for a specific materials property space. Further, data-driven approaches have emerged as a new tool to accelerate the development of novel organic electronic materials for organic light-emitting diode (OLED) applications. We demonstrate and validate a goal-directed generative machine learning framework based on a recurrent neural network (RNN) deep reinforcement learning approach for the design of hole transporting OLED materials. These large-scale molecular simulations also demonstrate a rapid, cost-effective method to identify new materials in OLEDs while also enabling expansion into many other verticals such as catalyst design, aerospace, life science, and petrochemicals.

10.
J Chem Inf Model ; 60(9): 4311-4325, 2020 09 28.
Artículo en Inglés | MEDLINE | ID: mdl-32484669

RESUMEN

The hit identification process usually involves the profiling of millions to more recently billions of compounds either via traditional experimental high-throughput screens (HTS) or computational virtual high-throughput screens (vHTS). We have previously demonstrated that, by coupling reaction-based enumeration, active learning, and free energy calculations, a similarly large-scale exploration of chemical space can be extended to the hit-to-lead process. In this work, we augment that approach by coupling large scale enumeration and cloud-based free energy perturbation (FEP) profiling with goal-directed generative machine learning, which results in a higher enrichment of potent ideas compared to large scale enumeration alone, while simultaneously staying within the bounds of predefined drug-like property space. We can achieve this by building the molecular distribution for generative machine learning from the PathFinder rules-based enumeration and optimizing for a weighted sum QSAR-based multiparameter optimization function. We examine the utility of this combined approach by designing potent inhibitors of cyclin-dependent kinase 2 (CDK2) and demonstrate a coupled workflow that can (1) provide a 6.4-fold enrichment improvement in identifying <10 nM compounds over random selection and a 1.5-fold enrichment in identifying <10 nM compounds over our previous method, (2) rapidly explore relevant chemical space outside the bounds of commercial reagents, (3) use generative ML approaches to "learn" the SAR from large scale in silico enumerations and generate novel idea molecules for a flexible receptor site that are both potent and within relevant physicochemical space, and (4) produce over 3 000 000 idea molecules and run 1935 FEP simulations, identifying 69 ideas with a predicted IC50 < 10 nM and 358 ideas with a predicted IC50 < 100 nM. The reported data suggest combining both reaction-based and generative machine learning for ideation results in a higher enrichment of potent compounds over previously described approaches and has the potential to rapidly accelerate the discovery of novel chemical matter within a predefined potency and property space.


Asunto(s)
Descubrimiento de Drogas , Preparaciones Farmacéuticas , Simulación por Computador , Objetivos , Aprendizaje Automático
11.
J Chem Inf Model ; 59(9): 3782-3793, 2019 09 23.
Artículo en Inglés | MEDLINE | ID: mdl-31404495

RESUMEN

The hit-to-lead and lead optimization processes usually involve the design, synthesis, and profiling of thousands of analogs prior to clinical candidate nomination. A hit finding campaign may begin with a virtual screen that explores millions of compounds, if not more. However, this scale of computational profiling is not frequently performed in the hit-to-lead or lead optimization phases of drug discovery. This is likely due to the lack of appropriate computational tools to generate synthetically tractable lead-like compounds in silico, and a lack of computational methods to accurately profile compounds prospectively on a large scale. Recent advances in computational power and methods provide the ability to profile much larger libraries of ligands than previously possible. Herein, we report a new computational technique, referred to as "PathFinder", that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. In this work, the integration of PathFinder-driven compound generation, cloud-based FEP simulations, and active learning are used to rapidly optimize R-groups, and generate new cores for inhibitors of cyclin-dependent kinase 2 (CDK2). Using this approach, we explored >300 000 ideas, performed >5000 FEP simulations, and identified >100 ligands with a predicted IC50 < 100 nM, including four unique cores. To our knowledge, this is the largest set of FEP calculations disclosed in the literature to date. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.


Asunto(s)
Quinasa 2 Dependiente de la Ciclina/antagonistas & inhibidores , Descubrimiento de Drogas , Aprendizaje Automático , Inhibidores de Proteínas Quinasas/química , Inhibidores de Proteínas Quinasas/farmacología , Quinasa 2 Dependiente de la Ciclina/metabolismo , Diseño de Fármacos , Descubrimiento de Drogas/métodos , Humanos , Modelos Moleculares , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología , Termodinámica
12.
Chem Sci ; 9(2): 513-530, 2018 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-29629118

RESUMEN

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...