RESUMEN
Engineered macromolecules offer compelling means for the therapy of conventionally undruggable interactions in human disease. However, their efficacy is limited by barriers to tissue and intracellular delivery. Inspired by recent advances in molecular barcoding and evolution, we developed BarcodeBabel, a generalized method for the design of libraries of peptide barcodes suitable for high-throughput mass spectrometry proteomics. Combined with PeptideBabel, a Monte Carlo sampling algorithm for the design of peptides with evolvable physicochemical properties and sequence complexity, we developed a barcoded library of cell penetrating peptides (CPPs) with distinct physicochemical features. Using quantitative targeted mass spectrometry, we identified CPPS with improved nuclear and cytoplasmic delivery exceeding hundreds of millions of molecules per human cell while maintaining minimal membrane disruption and negligible toxicity in vitro. These studies provide a proof of concept for peptide barcoding as a homogeneous high-throughput method for macromolecular screening and delivery. BarcodeBabel and PeptideBabel are available open-source from https://github.com/kentsisresearchgroup/.
Asunto(s)
Péptidos de Penetración Celular , Proteómica , Humanos , Proteómica/métodos , Péptidos de Penetración Celular/química , Algoritmos , Espectrometría de Masas/métodos , Biblioteca de Péptidos , Ensayos Analíticos de Alto Rendimiento/métodos , Sustancias Macromoleculares/química , Sustancias Macromoleculares/análisisRESUMEN
A high level of physical detail in a molecular model improves its ability to perform high accuracy simulations but can also significantly affect its complexity and computational cost. In some situations, it is worthwhile to add complexity to a model to capture properties of interest; in others, additional complexity is unnecessary and can make simulations computationally infeasible. In this work, we demonstrate the use of Bayesian inference for molecular model selection, using Monte Carlo sampling techniques accelerated with surrogate modeling to evaluate the Bayes factor evidence for different levels of complexity in the two-centered Lennard-Jones + quadrupole (2CLJQ) fluid model. Examining three nested levels of model complexity, we demonstrate that the use of variable quadrupole and bond length parameters in this model framework is justified only for some chemistries. Through this process, we also get detailed information about the distributions and correlation of parameter values, enabling improved parametrization and parameter analysis. We also show how the choice of parameter priors, which encode previous model knowledge, can have substantial effects on the selection of models, penalizing careless introduction of additional complexity. We detail the computational techniques used in this analysis, providing a roadmap for future applications of molecular model selection via Bayesian inference and surrogate modeling.
Asunto(s)
Teorema de Bayes , Simulación por Computador , Método de MontecarloRESUMEN
While Langevin integrators are popular in the study of equilibrium properties of complex systems, it is challenging to estimate the timestep-induced discretization error: the degree to which the sampled phase-space or configuration-space probability density departs from the desired target density due to the use of a finite integration timestep. Sivak et al., introduced a convenient approach to approximating a natural measure of error between the sampled density and the target equilibrium density, the Kullback-Leibler (KL) divergence, in phase space, but did not specifically address the issue of configuration-space properties, which are much more commonly of interest in molecular simulations. Here, we introduce a variant of this near-equilibrium estimator capable of measuring the error in the configuration-space marginal density, validating it against a complex but exact nested Monte Carlo estimator to show that it reproduces the KL divergence with high fidelity. To illustrate its utility, we employ this new near-equilibrium estimator to assess a claim that a recently proposed Langevin integrator introduces extremely small configuration-space density errors up to the stability limit at no extra computational expense. Finally, we show how this approach to quantifying sampling bias can be applied to a wide variety of stochastic integrators by following a straightforward procedure to compute the appropriate shadow work, and describe how it can be extended to quantify the error in arbitrary marginal or conditional distributions of interest.
RESUMEN
Molecular mechanics (MM) potentials have long been a workhorse of computational chemistry. Leveraging accuracy and speed, these functional forms find use in a wide variety of applications in biomolecular modeling and drug discovery, from rapid virtual screening to detailed free energy calculations. Traditionally, MM potentials have relied on human-curated, inflexible, and poorly extensible discrete chemical perception rules (atom types) for applying parameters to small molecules or biopolymers, making it difficult to optimize both types and parameters to fit quantum chemical or physical property data. Here, we propose an alternative approach that uses graph neural networks to perceive chemical environments, producing continuous atom embeddings from which valence and nonbonded parameters can be predicted using invariance-preserving layers. Since all stages are built from smooth neural functions, the entire process-spanning chemical perception to parameter assignment-is modular and end-to-end differentiable with respect to model parameters, allowing new force fields to be easily constructed, extended, and applied to arbitrary molecules. We show that this approach is not only sufficiently expressive to reproduce legacy atom types, but that it can learn to accurately reproduce and extend existing molecular mechanics force fields. Trained with arbitrary loss functions, it can construct entirely new force fields self-consistently applicable to both biopolymers and small molecules directly from quantum chemical calculations, with superior fidelity than traditional atom or parameter typing schemes. When adapted to simultaneously fit partial charge models, espaloma delivers high-quality partial atomic charges orders of magnitude faster than current best-practices with low inaccuracy. When trained on the same quantum chemical small molecule dataset used to parameterize the Open Force Field ("Parsley") openff-1.2.0 small molecule force field augmented with a peptide dataset, the resulting espaloma model shows superior accuracy vis-á-vis experiments in computing relative alchemical free energy calculations for a popular benchmark. This approach is implemented in the free and open source package espaloma, available at https://github.com/choderalab/espaloma.
RESUMEN
The computation of tautomer ratios of druglike molecules is enormously important in computer-aided drug discovery, as over a quarter of all approved drugs can populate multiple tautomeric species in solution. Unfortunately, accurate calculations of aqueous tautomer ratios-the degree to which these species must be penalized in order to correctly account for tautomers in modeling binding for computer-aided drug discovery-is surprisingly difficult. While quantum chemical approaches to computing aqueous tautomer ratios using continuum solvent models and rigid-rotor harmonic-oscillator thermochemistry are currently state of the art, these methods are still surprisingly inaccurate despite their enormous computational expense. Here, we show that a major source of this inaccuracy lies in the breakdown of the standard approach to accounting for quantum chemical thermochemistry using rigid rotor harmonic oscillator (RRHO) approximations, which are frustrated by the complex conformational landscape introduced by the migration of double bonds, creation of stereocenters, and introduction of multiple conformations separated by low energetic barriers induced by migration of a single proton. Using quantum machine learning (QML) methods that allow us to compute potential energies with quantum chemical accuracy at a fraction of the cost, we show how rigorous relative alchemical free energy calculations can be used to compute tautomer ratios in vacuum free from the limitations introduced by RRHO approximations. Furthermore, since the parameters of QML methods are tunable, we show how we can train these models to correct limitations in the underlying learned quantum chemical potential energy surface using free energies, enabling these methods to learn to generalize tautomer free energies across a broader range of predictions.
RESUMEN
Molecular mechanics force fields define how the energy and forces in a molecular system are computed from its atomic positions, thus enabling the study of such systems through computational methods like molecular dynamics and Monte Carlo simulations. Despite progress toward automated force field parametrization, considerable human expertise is required to develop or extend force fields. In particular, human input has long been required to define atom types, which encode chemically unique environments that determine which parameters will be assigned. However, relying on humans to establish atom types is suboptimal. Human-created atom types are often developed without statistical justification, leading to over- or under-fitting of data. Human-created types are also difficult to extend in a systematic and consistent manner when new chemistries must be modeled or new data becomes available. Finally, human effort is not scalable when force fields must be generated for new (bio)polymers, compound classes, or materials. To remedy these deficiencies, our long-term goal is to replace human specification of atom types with an automated approach, based on rigorous statistics and driven by experimental and/or quantum chemical reference data. In this work, we describe novel methods that automate the discovery of appropriate chemical perception: SMARTY allows for the creation of atom types, while SMIRKY goes further by automating the creation of fragment (nonbonded, bonds, angles, and torsions) types. These approaches enable the creation of move sets in atom or fragment type space, which are used within a Monte Carlo optimization approach. We demonstrate the power of these new methods by automating the rediscovery of human defined atom types (SMARTY) or fragment types (SMIRKY) in existing small molecule force fields. We assess these approaches using several molecular data sets, including one which covers a diverse subset of the DrugBank database.
Asunto(s)
Simulación de Dinámica Molecular , Teoría Cuántica , Humanos , Método de MontecarloRESUMEN
Biomolecular simulations are typically performed in an aqueous environment where the number of ions remains fixed for the duration of the simulation, generally with either a minimally neutralizing ion environment or a number of salt pairs intended to match the macroscopic salt concentration. In contrast, real biomolecules experience local ion environments where the salt concentration is dynamic and may differ from bulk. The degree of salt concentration variability and average deviation from the macroscopic concentration remains, as yet, unknown. Here, we describe the theory and implementation of a Monte Carlo osmostat that can be added to explicit solvent molecular dynamics or Monte Carlo simulations to sample from a semigrand canonical ensemble in which the number of salt pairs fluctuates dynamically during the simulation. The osmostat reproduces the correct equilibrium statistics for a simulation volume that can exchange ions with a large reservoir at a defined macroscopic salt concentration. To achieve useful Monte Carlo acceptance rates, the method makes use of nonequilibrium candidate Monte Carlo (NCMC) moves in which monovalent ions and water molecules are alchemically transmuted using short nonequilibrium trajectories, with a modified Metropolis-Hastings criterion ensuring correct equilibrium statistics for an ( Δµ, N, p, T) ensemble to achieve a â¼1046× boost in acceptance rates. We demonstrate how typical protein (DHFR and the tyrosine kinase Src) and nucleic acid (Drew-Dickerson B-DNA dodecamer) systems exhibit salt concentration distributions that significantly differ from fixed-salt bulk simulations and display fluctuations that are on the same order of magnitude as the average.
Asunto(s)
ADN Forma B/química , Sales (Química)/química , Tetrahidrofolato Deshidrogenasa/química , Familia-src Quinasas/química , ADN Forma B/metabolismo , Iones/química , Simulación de Dinámica Molecular , Método de Montecarlo , Concentración Osmolar , Electricidad Estática , Tetrahidrofolato Deshidrogenasa/metabolismo , Agua/química , Familia-src Quinasas/metabolismoRESUMEN
Accurately predicting protein-ligand binding affinities and binding modes is a major goal in computational chemistry, but even the prediction of ligand binding modes in proteins poses major challenges. Here, we focus on solving the binding mode prediction problem for rigid fragments. That is, we focus on computing the dominant placement, conformation, and orientations of a relatively rigid, fragment-like ligand in a receptor, and the populations of the multiple binding modes which may be relevant. This problem is important in its own right, but is even more timely given the recent success of alchemical free energy calculations. Alchemical calculations are increasingly used to predict binding free energies of ligands to receptors. However, the accuracy of these calculations is dependent on proper sampling of the relevant ligand binding modes. Unfortunately, ligand binding modes may often be uncertain, hard to predict, and/or slow to interconvert on simulation time scales, so proper sampling with current techniques can require prohibitively long simulations. We need new methods which dramatically improve sampling of ligand binding modes. Here, we develop and apply a nonequilibrium candidate Monte Carlo (NCMC) method to improve sampling of ligand binding modes. In this technique, the ligand is rotated and subsequently allowed to relax in its new position through alchemical perturbation before accepting or rejecting the rotation and relaxation as a nonequilibrium Monte Carlo move. When applied to a T4 lysozyme model binding system, this NCMC method shows over 2 orders of magnitude improvement in binding mode sampling efficiency compared to a brute force molecular dynamics simulation. This is a first step toward applying this methodology to pharmaceutically relevant binding of fragments and, eventually, drug-like molecules. We are making this approach available via our new Binding modes of ligands using enhanced sampling (BLUES) package which is freely available on GitHub.