RESUMEN
Atomic partial charges are crucial parameters in molecular dynamics simulation, dictating the electrostatic contributions to intermolecular energies and thereby the potential energy landscape. Traditionally, the assignment of partial charges has relied on surrogates of ab initio semiempirical quantum chemical methods such as AM1-BCC and is expensive for large systems or large numbers of molecules. We propose a hybrid physical/graph neural network-based approximation to the widely popular AM1-BCC charge model that is orders of magnitude faster while maintaining accuracy comparable to differences in AM1-BCC implementations. Our hybrid approach couples a graph neural network to a streamlined charge equilibration approach in order to predict molecule-specific atomic electronegativity and hardness parameters, followed by analytical determination of optimal charge-equilibrated parameters that preserve total molecular charge. This hybrid approach scales linearly with the number of atoms, enabling for the first time the use of fully consistent charge models for small molecules and biopolymers for the construction of next-generation self-consistent biomolecular force fields. Implemented in the free and open source package EspalomaCharge, this approach provides drop-in replacements for both AmberTools antechamber and the Open Force Field Toolkit charging workflows, in addition to stand-alone charge generation interfaces. Source code is available at https://github.com/choderalab/espaloma-charge.
RESUMEN
Fragment molecular orbital (FMO) method is a powerful computational tool for structure-based drug design, in which protein-ligand interactions can be described by the inter-fragment interaction energy (IFIE) and its pair interaction energy decomposition analysis (PIEDA). Here, we introduced a dynamically averaged (DA) FMO-based approach in which molecular dynamics simulations were used to generate multiple protein-ligand complex structures for FMO calculations. To assess this approach, we examined the correlation between the experimental binding free energies and DA-IFIEs of six CDK2 inhibitors whose net charges are zero. The correlation between the experimental binding free energies and snapshot IFIEs for X-ray crystal structures was R2 = 0.75. Using the DA-IFIEs, the correlation significantly improved to 0.99. When an additional CDK2 inhibitor with net charge of -1 was added, the DA FMO-based scheme with the dispersion energies still achieved R2 = 0.99, whereas R2 decreased to 0.32 employing all the energy terms of PIEDA.
Asunto(s)
Simulación de Dinámica Molecular , Proteínas , Quinasa 2 Dependiente de la Ciclina , Diseño de Fármacos , Ligandos , Unión ProteicaRESUMEN
We developed the world's first web-based public database for the storage, management, and sharing of fragment molecular orbital (FMO) calculation data sets describing the complex interactions between biomacromolecules, named FMO Database (https://drugdesign.riken.jp/FMODB/). Each entry in the database contains relevant background information on how the data was compiled as well as the total energy of each molecular system and interfragment interaction energy (IFIE) and pair interaction energy decomposition analysis (PIEDA) values. Currently, the database contains more than 13â¯600 FMO calculation data sets, and a comprehensive search function implemented at the front-end. The procedure for selecting target proteins, preprocessing the experimental structures, construction of the database, and details of the database front-end were described. Then, we demonstrated a use of the FMODB by comparing IFIE value distributions of hydrogen bond, ion-pair, and XH/π interactions obtained by FMO method to those by molecular mechanics approach. From the comparison, the statistical analysis of the data provided standard reference values for the three types of interactions that will be useful for determining whether each interaction in a given system is relatively strong or weak compared to the interactions contained within the data in the FMODB. In the final part, we demonstrate the use of the database to examine the contribution of halogen atoms to the binding affinity between human cathepsin L and its inhibitors. We found that the electrostatic term derived by PIEDA greatly correlated with the binding affinities of the halogen containing cathepsin L inhibitors, indicating the importance of QM calculation for quantitative analysis of halogen interactions. Thus, the FMO calculation data in FMODB will be useful for conducting statistical analyses to drug discovery, for conducting molecular recognition studies in structural biology, and for other studies involving quantum mechanics-based interactions.
Asunto(s)
Descubrimiento de Drogas , Teoría Cuántica , Humanos , Simulación de Dinámica Molecular , Proteínas , Electricidad EstáticaRESUMEN
We propose edge expansion parallel cascade selection molecular dynamics (eePaCS-MD) as an efficient adaptive conformational sampling method to investigate the large-amplitude motions of proteins without prior knowledge of the conformational transitions. In this method, multiple independent MD simulations are iteratively conducted from initial structures randomly selected from the vertices of a multi-dimensional principal component subspace. This subspace is defined by an ensemble of protein conformations sampled during previous cycles of eePaCS-MD. The edges and vertices of the conformational subspace are determined by solving the "convex hull problem." The sampling efficiency of eePaCS-MD is achieved by intensively repeating MD simulations from the vertex structures, which increases the probability of rare event occurrence to explore new large-amplitude collective motions. The conformational sampling efficiency of eePaCS-MD was assessed by investigating the open-close transitions of glutamine binding protein, maltose/maltodextrin binding protein, and adenylate kinase and comparing the results to those obtained using related methods. In all cases, the open-close transitions were simulated in â¼10 ns of simulation time or less, offering 1-3 orders of magnitude shorter simulation time compared to conventional MD. Furthermore, we show that the combination of eePaCS-MD and accelerated MD can further enhance conformational sampling efficiency, which reduced the total computational cost of observing the open-close transitions by at most 36%.
Asunto(s)
Adenilato Quinasa/química , Proteínas Portadoras/química , Proteínas de Unión a Maltosa/química , Escherichia coli/química , Proteínas de Escherichia coli/química , Cadenas de Markov , Simulación de Dinámica Molecular , Conformación Proteica , TermodinámicaRESUMEN
Drug discovery is stochastic. The effectiveness of candidate compounds in satisfying design objectives is unknown ahead of time, and the tools used for prioritization-predictive models and assays-are inaccurate and noisy. In a typical discovery campaign, thousands of compounds may be synthesized and tested before design objectives are achieved, with many others ideated but deprioritized. These challenges are well-documented, but assessing potential remedies has been difficult. We introduce DrugGym, a framework for modeling the stochastic process of drug discovery. Emulating biochemical assays with realistic surrogate models, we simulate the progression from weak hits to sub-micromolar leads with viable ADME. We use this testbed to examine how different ideation, scoring, and decision-making strategies impact statistical measures of utility, such as the probability of program success within predefined budgets and the expected costs to achieve target candidate profile (TCP) goals. We also assess the influence of affinity model inaccuracy, chemical creativity, batch size, and multi-step reasoning. Our findings suggest that reducing affinity model inaccuracy from 2 to 0.5 pIC50 units improves budget-constrained success rates tenfold. DrugGym represents a realistic testbed for machine learning methods applied to the hit-to-lead phase. Source code is available at www.drug-gym.org.
RESUMEN
The development of reliable and extensible molecular mechanics (MM) force fields-fast, empirical models characterizing the potential energy surface of molecular systems-is indispensable for biomolecular simulation and computer-aided drug design. Here, we introduce a generalized and extensible machine-learned MM force field, espaloma-0.3, and an end-to-end differentiable framework using graph neural networks to overcome the limitations of traditional rule-based methods. Trained in a single GPU-day to fit a large and diverse quantum chemical dataset of over 1.1 M energy and force calculations, espaloma-0.3 reproduces quantum chemical energetic properties of chemical domains highly relevant to drug discovery, including small molecules, peptides, and nucleic acids. Moreover, this force field maintains the quantum chemical energy-minimized geometries of small molecules and preserves the condensed phase properties of peptides and folded proteins, self-consistently parametrizing proteins and ligands to produce stable simulations leading to highly accurate predictions of binding free energies. This methodology demonstrates significant promise as a path forward for systematically building more accurate force fields that are easily extensible to new chemical domains of interest.
RESUMEN
Molecular mechanics (MM) potentials have long been a workhorse of computational chemistry. Leveraging accuracy and speed, these functional forms find use in a wide variety of applications in biomolecular modeling and drug discovery, from rapid virtual screening to detailed free energy calculations. Traditionally, MM potentials have relied on human-curated, inflexible, and poorly extensible discrete chemical perception rules (atom types) for applying parameters to small molecules or biopolymers, making it difficult to optimize both types and parameters to fit quantum chemical or physical property data. Here, we propose an alternative approach that uses graph neural networks to perceive chemical environments, producing continuous atom embeddings from which valence and nonbonded parameters can be predicted using invariance-preserving layers. Since all stages are built from smooth neural functions, the entire process-spanning chemical perception to parameter assignment-is modular and end-to-end differentiable with respect to model parameters, allowing new force fields to be easily constructed, extended, and applied to arbitrary molecules. We show that this approach is not only sufficiently expressive to reproduce legacy atom types, but that it can learn to accurately reproduce and extend existing molecular mechanics force fields. Trained with arbitrary loss functions, it can construct entirely new force fields self-consistently applicable to both biopolymers and small molecules directly from quantum chemical calculations, with superior fidelity than traditional atom or parameter typing schemes. When adapted to simultaneously fit partial charge models, espaloma delivers high-quality partial atomic charges orders of magnitude faster than current best-practices with low inaccuracy. When trained on the same quantum chemical small molecule dataset used to parameterize the Open Force Field ("Parsley") openff-1.2.0 small molecule force field augmented with a peptide dataset, the resulting espaloma model shows superior accuracy vis-á-vis experiments in computing relative alchemical free energy calculations for a popular benchmark. This approach is implemented in the free and open source package espaloma, available at https://github.com/choderalab/espaloma.
RESUMEN
Two chemical series of novel protein kinase C ζ (PKCζ) inhibitors, 4,6-disubstituted and 5,7-disubstituted isoquinolines, were rapidly identified using our fragment merging strategy. This methodology involves biochemical screening of a high concentration of a monosubstituted isoquinoline fragment library, then merging hit isoquinoline fragments into a single compound. Our strategy can be applied to the discovery of other challenging kinase inhibitors without protein-ligand structural information. Furthermore, our optimization effort identified the highly potent and orally available 5,7-isoquinoline 37 from the second chemical series. Compound 37 showed good efficacy in a mouse collagen-induced arthritis model. The in vivo studies suggest that PKCζ inhibition is a novel target for rheumatoid arthritis (RA) and that 5,7-disubstituted isoquinoline 37 has the potential to elucidate the biological consequences of PKCζ inhibition, specifically in terms of therapeutic intervention for RA.