Search | VHL Regional Portal

1.

Impact of protein conformations on binding free energy calculations in the beta-secretase 1 system.

Baumann, Hannah M; Mobley, David L.

J Comput Chem ; 2024 May 09.

Article in English | MEDLINE | ID: mdl-38725239

ABSTRACT

In binding free energy calculations, simulations must sample all relevant conformations of the system in order to obtain unbiased results. For instance, different ligands can bind to different metastable states of a protein, and if these protein conformational changes are not sampled in relative binding free energy calculations, the contribution of these states to binding is not accounted for and thus calculated binding free energies are inaccurate. In this work, we investigate the impact of different beta-sectretase 1 (BACE1) protein conformations obtained from x-ray crystallography on the binding of BACE1 inhibitors. We highlight how these conformational changes are not adequately sampled in typical molecular dynamics simulations. Furthermore, we show that insufficient sampling of relevant conformations induces substantial error in relative binding free energy calculations, as judged by a variation in calculated relative binding free energies up to 2 kcal/mol depending on the starting protein conformation. These results emphasize the importance of protein conformational sampling and pose this BACE1 system as a challenge case for further method development in the area of enhanced protein conformational sampling, either in combination with binding calculations or as an endpoint correction.

2.

The SAMPL9 host-guest blind challenge: an overview of binding free energy predictive accuracy.

Amezcua, Martin; Setiadi, Jeffry; Mobley, David L.

Phys Chem Chem Phys ; 26(12): 9207-9225, 2024 Mar 20.

Article in English | MEDLINE | ID: mdl-38444308

ABSTRACT

We report the results of the SAMPL9 host-guest blind challenge for predicting binding free energies. The challenge focused on macrocycles from pillar[n]-arene and cyclodextrin host families, including WP6, and bCD and HbCD. A variety of methods were used by participants to submit binding free energy predictions. A machine learning approach based on molecular descriptors achieved the highest accuracy (RMSE of 2.04 kcal mol-1) among the ranked methods in the WP6 dataset. Interestingly, predictions for WP6 obtained via docking tended to outperform all methods (RMSE of 1.70 kcal mol-1), most of which are MD based and computationally more expensive. In general, methods applying force fields achieved better correlation with experiments for WP6 opposed to the machine learning and docking models. In the cyclodextrin-phenothiazine challenge, the ATM approach emerged as the top performing method with RMSE less than 1.86 kcal mol-1. Correlation metrics of ranked methods in this dataset were relatively poor compared to WP6. We also highlight several lessons learned to guide future work and help improve studies on the systems discussed. For example, WP6 may be present in other microstates other than its -12 state in the presence of certain guests. Machine learning approaches can be used to fine tune or help train force fields for certain chemistry (i.e. WP6-G4). Certain phenothiazines occupy distinct primary and secondary orientations, some of which were considered individually for accurate binding free energies. The accuracy of predictions from certain methods while starting from a single binding pose/orientation demonstrates the sensitivity of calculated binding free energies to the orientation, and in some cases the likely dominant orientation for the system. Computational and experimental results suggest that guest phenothiazine core traverses both the secondary and primary faces of the cyclodextrin hosts, a bulky cationic side chain will primarily occupy the primary face, and the phenothiazine core substituent resides at the larger secondary face.

3.

PopShift: A Thermodynamically Sound Approach to Estimate Binding Free Energies by Accounting for Ligand-Induced Population Shifts from a Ligand-Free Markov State Model.

Smith, Louis G; Novak, Borna; Osato, Meghan; Mobley, David L; Bowman, Gregory R.

J Chem Theory Comput ; 20(3): 1036-1050, 2024 Feb 13.

Article in English | MEDLINE | ID: mdl-38291966

ABSTRACT

Obtaining accurate binding free energies from in silico screens has been a long-standing goal for the computational chemistry community. However, accuracy and computational cost are at odds with one another, limiting the utility of methods that perform this type of calculation. Many methods achieve massive scale by explicitly or implicitly assuming that the target protein adopts a single structure, or undergoes limited fluctuations around that structure, to minimize computational cost. Others simulate each protein-ligand complex of interest, accepting lower throughput in exchange for better predictions of binding affinities. Here, we present the PopShift framework for accounting for the ensemble of structures a protein adopts and their relative probabilities. Protein degrees of freedom are enumerated once, and then arbitrarily many molecules can be screened against this ensemble. Specifically, we use Markov state models (MSMs) as a compressed representation of a protein's thermodynamic ensemble. We start with a ligand-free MSM and then calculate how addition of a ligand shifts the populations of each protein conformational state based on the strength of the interaction between that protein conformation and the ligand. In this work we use docking to estimate the affinity between a given protein structure and ligand, but any estimator of binding affinities could be used in the PopShift framework. We test PopShift on the classic benchmark pocket T4 Lysozyme L99A. We find that PopShift is more accurate than common strategies, such as docking to a single structure and traditional ensemble dockingâproducing results that compare favorably with alchemical binding free energy calculations in terms of RMSE but not correlationâand may have a more favorable computational cost profile in some applications. In addition to predicting binding free energies and ligand poses, PopShift also provides insight into how the probability of different protein structures is shifted upon addition of various concentrations of ligand, providing a platform for predicting affinities and allosteric effects of ligand binding. Therefore, we expect PopShift will be valuable for hit finding and for providing insight into phenomena like allostery.

Subject(s)

Proteins , Protein Binding , Ligands , Proteins/chemistry , Entropy , Protein Conformation , Thermodynamics , Binding Sites

4.

A Fast, Convenient, Polarizable Electrostatic Model for Molecular Dynamics.

Wang, Liangyue; Schauperl, Michael; Mobley, David L; Bayly, Christopher; Gilson, Michael K.

J Chem Theory Comput ; 20(3): 1293-1305, 2024 Feb 13.

Article in English | MEDLINE | ID: mdl-38240687

ABSTRACT

We present an efficient polarizable electrostatic model, utilizing typed, atom-centered polarizabilities and the fast direct approximation, designed for efficient use in molecular dynamics (MD) simulations. The model provides two convenient approaches for assigning partial charges in the context of atomic polarizabilities. One is a generalization of RESP, called RESP-dPol, and the other, AM1-BCC-dPol, is an adaptation of the widely used AM1-BCC method. Both are designed to accurately replicate gas-phase quantum mechanical electrostatic potentials. Benchmarks of this polarizable electrostatic model against gas-phase dipole moments, molecular polarizabilities, bulk liquid densities, and static dielectric constants of organic liquids show good agreement with the reference values. Of note, the model yields markedly more accurate dielectric constants of organic liquids, relative to a matched nonpolarizable force field. MD simulations with this method, which is currently parametrized for molecules containing elements C, N, O, and H, run only about 3.6-fold slower than fixed charge force fields, while simulations with the self-consistent mutual polarization average 4.5-fold slower. Our results suggest that RESP-dPol and AM1-BCC-dPol afford improved accuracy relative to fixed charge force fields and are good starting points for developing general, affordable, and transferable polarizable force fields. The software implementing these approaches has been designed to utilize the force field fitting frameworks developed and maintained by the Open Force Field Initiative, setting the stage for further exploration of this approach to polarizable force field development.

5.

A transferable double exponential potential for condensed phase simulations of small molecules.

Horton, Joshua T; Boothroyd, Simon; Behara, Pavan Kumar; Mobley, David L; Cole, Daniel J.

Digit Discov ; 2(4): 1178-1187, 2023 Aug 08.

Article in English | MEDLINE | ID: mdl-38013814

ABSTRACT

The Lennard-Jones potential is the most widely-used function for the description of non-bonded interactions in transferable force fields for the condensed phase. This is not because it has an optimal functional form, but rather it is a legacy resulting from when computational expense was a major consideration and this potential was particularly convenient numerically. At present, it persists because the effort that would be required to re-write molecular modelling software and train new force fields has, until now, been prohibitive. Here, we present Smirnoff-plugins as a flexible framework to extend the Open Force Field software stack to allow custom force field functional forms. We deploy Smirnoff-plugins with the automated Open Force Field infrastructure to train a transferable, small molecule force field based on the recently-proposed double exponential functional form, on over 1000 experimental condensed phase properties. Extensive testing of the resulting force field shows improvements in transfer free energies, with acceptable conformational energetics, run times and convergence properties compared to state-of-the-art Lennard-Jones based force fields.

6.

Building Block-Based Binding Predictions for DNA-Encoded Libraries.

Zhang, Chris; Pitman, Mary; Dixit, Anjali; Leelananda, Sumudu; Palacci, Henri; Lawler, Meghan; Belyanskaya, Svetlana; Grady, LaShadric; Franklin, Joe; Tilmans, Nicolas; Mobley, David L.

J Chem Inf Model ; 63(16): 5120-5132, 2023 08 28.

Article in English | MEDLINE | ID: mdl-37578123

ABSTRACT

DNA-encoded libraries (DELs) provide the means to make and screen millions of diverse compounds against a target of interest in a single experiment. However, despite producing large volumes of binding data at a relatively low cost, the DEL selection process is susceptible to noise, necessitating computational follow-up to increase signal-to-noise ratios. In this work, we present a set of informatics tools to employ data from prior DEL screen(s) to gain information about which building blocks are most likely to be productive when designing new DELs for the same target. We demonstrate that similar building blocks have similar probabilities of forming compounds that bind. We then build a model from the inference that the combined behavior of individual building blocks is predictive of whether an overall compound binds. We illustrate our approach on a set of three-cycle OpenDEL libraries screened against soluble epoxide hydrolase (sEH) and report performance of more than an order of magnitude greater than random guessing on a holdout set, demonstrating that our model can serve as a baseline for comparison against other machine learning models on DEL data. Lastly, we provide a discussion on how we believe this informatics workflow could be applied to benefit researchers in their specific DEL campaigns.

Subject(s)

Drug Discovery , Small Molecule Libraries , Small Molecule Libraries/chemistry , DNA/chemistry , Machine Learning

7.

Broadening the Scope of Binding Free Energy Calculations Using a Separated Topologies Approach.

Baumann, Hannah M; Dybeck, Eric; McClendon, Christopher L; Pickard, Frank C; Gapsys, Vytautas; Pérez-Benito, Laura; Hahn, David F; Tresadern, Gary; Mathiowetz, Alan M; Mobley, David L.

J Chem Theory Comput ; 19(15): 5058-5076, 2023 Aug 08.

Article in English | MEDLINE | ID: mdl-37487138

ABSTRACT

Binding free energy calculations predict the potency of compounds to protein binding sites in a physically rigorous manner and see broad application in prioritizing the synthesis of novel drug candidates. Relative binding free energy (RBFE) calculations have emerged as an industry-standard approach to achieve highly accurate rank-order predictions of the potency of related compounds; however, this approach requires that the ligands share a common scaffold and a common binding mode, restricting the methods' domain of applicability. This is a critical limitation since complex modifications to the ligands, especially core hopping, are very common in drug design. Absolute binding free energy (ABFE) calculations are an alternate method that can be used for ligands that are not congeneric. However, ABFE suffers from a known problem of long convergence times due to the need to sample additional degrees of freedom within each system, such as sampling rearrangements necessary to open and close the binding site. Here, we report on an alternative method for RBFE, called Separated Topologies (SepTop), which overcomes the issues in both of the aforementioned methods by enabling large scaffold changes between ligands with a convergence time comparable to traditional RBFE. Instead of only mutating atoms that vary between two ligands, this approach performs two absolute free energy calculations at the same time in opposite directions, one for each ligand. Defining the two ligands independently allows the comparison of the binding of diverse ligands without the artificial constraints of identical poses or a suitable atom-atom mapping. This approach also avoids the need to sample the unbound state of the protein, making it more efficient than absolute binding free energy calculations. Here, we introduce an implementation of SepTop. We developed a general and efficient protocol for running SepTop, and we demonstrated the method on four diverse, pharmaceutically relevant systems. We report the performance of the method, as well as our practical insights into the strengths, weaknesses, and challenges of applying this method in an industrial drug design setting. We find that the accuracy of the approach is sufficiently high to rank order ligands with an accuracy comparable to traditional RBFE calculations while maintaining the additional flexibility of SepTop.

8.

PopShift: A thermodynamically sound approach to estimate binding free energies by accounting for ligand-induced population shifts from a ligand-free MSM.

Smith, Louis G; Novak, Borna; Osato, Meghan; Mobley, David L; Bowman, Gregory R.

bioRxiv ; 2023 Aug 08.

Article in English | MEDLINE | ID: mdl-37503302

ABSTRACT

Obtaining accurate binding free energies from in silico screens has been a longstanding goal for the computational chemistry community. However, accuracy and computational cost are at odds with one another, limiting the utility of methods that perform this type of calculation. Many methods achieve massive scale by explicitly or implicitly assuming that the target protein adopts a single structure, or undergoes limited fluctuations around that structure, to minimize computational cost. Others simulate each protein-ligand complex of interest, accepting lower throughput in exchange for better predictions of binding affinities. Here, we present the PopShift framework for accounting for the ensemble of structures a protein adopts and their relative probabilities. Protein degrees of freedom are enumerated once, and then arbitrarily many molecules can be screened against this ensemble. Specifically, we use Markov state models (MSMs) as a compressed representation of a protein's thermodynamic ensemble. We start with a ligand-free MSM and then calculate how addition of a ligand shifts the populations of each protein conformational state based on the strength of the interaction between that protein conformation and the ligand. In this work we use docking to estimate the affinity between a given protein structure and ligand, but any estimator of binding affinities could be used in the PopShift framework. We test PopShift on the classic benchmark pocket T4 Lysozyme L99A. We find that PopShift is more accurate than common strategies, such as docking to a single structure and traditional ensemble docking-producing results that compare favorably with alchemical binding free energy calculations in terms of RMSE but not correlation - and may have a more favorable computational cost profile in some applications. In addition to predicting binding free energies and ligand poses, PopShift also provides insight into how the probability of different protein structures is shifted upon addition of various concentrations of ligand, providing a platform for predicting affinities and allosteric effects of ligand binding. Therefore, we expect PopShift will be valuable for hit finding and for providing insight into phenomena like allostery.

9.

Development and Benchmarking of Open Force Field 2.0.0: The Sage Small Molecule Force Field.

Boothroyd, Simon; Behara, Pavan Kumar; Madin, Owen C; Hahn, David F; Jang, Hyesu; Gapsys, Vytautas; Wagner, Jeffrey R; Horton, Joshua T; Dotson, David L; Thompson, Matthew W; Maat, Jessica; Gokey, Trevor; Wang, Lee-Ping; Cole, Daniel J; Gilson, Michael K; Chodera, John D; Bayly, Christopher I; Shirts, Michael R; Mobley, David L.

J Chem Theory Comput ; 19(11): 3251-3275, 2023 Jun 13.

Article in English | MEDLINE | ID: mdl-37167319

ABSTRACT

We introduce the Open Force Field (OpenFF) 2.0.0 small molecule force field for drug-like molecules, code-named Sage, which builds upon our previous iteration, Parsley. OpenFF force fields are based on direct chemical perception, which generalizes easily to highly diverse sets of chemistries based on substructure queries. Like the previous OpenFF iterations, the Sage generation of OpenFF force fields was validated in protein-ligand simulations to be compatible with AMBER biopolymer force fields. In this work, we detail the methodology used to develop this force field, as well as the innovations and improvements introduced since the release of Parsley 1.0.0. One particularly significant feature of Sage is a set of improved Lennard-Jones (LJ) parameters retrained against condensed phase mixture data, the first refit of LJ parameters in the OpenFF small molecule force field line. Sage also includes valence parameters refit to a larger database of quantum chemical calculations than previous versions, as well as improvements in how this fitting is performed. Force field benchmarks show improvements in general metrics of performance against quantum chemistry reference data such as root-mean-square deviations (RMSD) of optimized conformer geometries, torsion fingerprint deviations (TFD), and improved relative conformer energetics (ΔΔE). We present a variety of benchmarks for these metrics against our previous force fields as well as in some cases other small molecule force fields. Sage also demonstrates improved performance in estimating physical properties, including comparison against experimental data from various thermodynamic databases for small molecule properties such as ΔHmix, ρ(x), ΔGsolv, and ΔGtrans. Additionally, we benchmarked against protein-ligand binding free energies (ΔGbind), where Sage yields results statistically similar to previous force fields. All the data is made publicly available along with complete details on how to reproduce the training results at https://github.com/openforcefield/openff-sage.

Subject(s)

Benchmarking , Proteins , Ligands , Proteins/chemistry , Thermodynamics , Entropy

10.

To Design Scalable Free Energy Perturbation Networks, Optimal Is Not Enough.

Pitman, Mary; Hahn, David F; Tresadern, Gary; Mobley, David L.

J Chem Inf Model ; 63(6): 1776-1793, 2023 03 27.

Article in English | MEDLINE | ID: mdl-36878475

ABSTRACT

Drug discovery is accelerated with computational methods such as alchemical simulations to estimate ligand affinities. In particular, relative binding free energy (RBFE) simulations are beneficial for lead optimization. To use RBFE simulations to compare prospective ligands in silico, researchers first plan the simulation experiment, using graphs where nodes represent ligands and graph edges represent alchemical transformations between ligands. Recent work demonstrated that optimizing the statistical architecture of these perturbation graphs improves the accuracy of the predicted changes in the free energy of ligand binding. Therefore, to improve the success rate of computational drug discovery, we present the open-source software package High Information Mapper (HiMap)âa new take on its predecessor, Lead Optimization Mapper (LOMAP). HiMap removes heuristics decisions from design selection and instead finds statistically optimal graphs over ligands clustered with machine learning. Beyond optimal design generation, we present theoretical insights for designing alchemical perturbation maps. Some of these results include that for n number of nodes, the precision of perturbation maps is stable at n·ln(n) edges. This result indicates that even an "optimal" graph can result in unexpectedly high errors if a plan includes too few alchemical transformations for the given number of ligands and edges. And, as a study compares more ligands, the performance of even optimal graphs will deteriorate with linear scaling of the edge count. In this sense, ensuring an A- or D-optimal topology is not enough to produce robust errors. We additionally find that optimal designs will converge more rapidly than radial and LOMAP designs. Moreover, we derive bounds for how clustering reduces cost for designs with a constant expected relative error per cluster, invariant of the size of the design. These results inform how to best design perturbation maps for computational drug discovery and have broader implications for experimental design.

Subject(s)

Molecular Dynamics Simulation , Thermodynamics , Ligands , Prospective Studies , Entropy , Protein Binding

11.

Molecular-dynamics simulation methods for macromolecular crystallography.

Wych, David C; Aoto, Phillip C; Vu, Lily; Wolff, Alexander M; Mobley, David L; Fraser, James S; Taylor, Susan S; Wall, Michael E.

Acta Crystallogr D Struct Biol ; 79(Pt 1): 50-65, 2023 Jan 01.

Article in English | MEDLINE | ID: mdl-36601807

ABSTRACT

It is investigated whether molecular-dynamics (MD) simulations can be used to enhance macromolecular crystallography (MX) studies. Historically, protein crystal structures have been described using a single set of atomic coordinates. Because conformational variation is important for protein function, researchers now often build models that contain multiple structures. Methods for building such models can fail, however, in regions where the crystallographic density is difficult to interpret, for example at the protein-solvent interface. To address this limitation, a set of MD-MX methods that combine MD simulations of protein crystals with conventional modeling and refinement tools have been developed. In an application to a cyclic adenosine monophosphate-dependent protein kinase at room temperature, the procedure improved the interpretation of ambiguous density, yielding an alternative water model and a revised protein model including multiple conformations. The revised model provides mechanistic insights into the catalytic and regulatory interactions of the enzyme. The same methods may be used in other MX studies to seek mechanistic insights.

Subject(s)

Molecular Dynamics Simulation , Proteins , Protein Conformation , Proteins/chemistry , Solvents/chemistry , Crystallography, X-Ray

12.

Enhanced Grand Canonical Sampling of Occluded Water Sites Using Nonequilibrium Candidate Monte Carlo.

Melling, Oliver J; Samways, Marley L; Ge, Yunhui; Mobley, David L; Essex, Jonathan W.

J Chem Theory Comput ; 19(3): 1050-1062, 2023 Feb 14.

Article in English | MEDLINE | ID: mdl-36692215

ABSTRACT

Water molecules play a key role in many biomolecular systems, particularly when bound at protein-ligand interfaces. However, molecular simulation studies on such systems are hampered by the relatively long time scales over which water exchange between a protein and solvent takes place. Grand canonical Monte Carlo (GCMC) is a simulation technique that avoids this issue by attempting the insertion and deletion of water molecules within a given structure. The approach is constrained by low acceptance probabilities for insertions in congested systems, however. To address this issue, here, we combine GCMC with nonequilibium candidate Monte Carlo (NCMC) to yield a method that we refer to as grand canonical nonequilibrium candidate Monte Carlo (GCNCMC), in which the water insertions and deletions are carried out in a gradual, nonequilibrium fashion. We validate this new approach by comparing GCNCMC and GCMC simulations of bulk water and three protein binding sites. We find that not only is the efficiency of the water sampling improved by GCNCMC but that it also results in increased sampling of ligand conformations in a protein binding site, revealing new water-mediated ligand-binding geometries that are not observed using alternative enhanced sampling techniques.

13.

Prioritizing Small Sets of Molecules for Synthesis through in-silico Tools: A Comparison of Common Ranking Methods.

Breznik, Marko; Ge, Yunhui; Bluck, Joseph P; Briem, Hans; Hahn, David F; Christ, Clara D; Mortier, Jérémie; Mobley, David L; Meier, Katharina.

ChemMedChem ; 18(1): e202200425, 2023 01 03.

Article in English | MEDLINE | ID: mdl-36240514

ABSTRACT

Prioritizing molecules for synthesis is a key role of computational methods within medicinal chemistry. Multiple tools exist for ranking molecules, from the cheap and popular molecular docking methods to more computationally expensive molecular-dynamics (MD)-based methods. It is often questioned whether the accuracy of the more rigorous methods justifies the higher computational cost and associated calculation time. Here, we compared the performance on ranking the binding of small molecules for seven scoring functions from five docking programs, one end-point method (MM/GBSA), and two MD-based free energy methods (PMX, FEP+). We investigated 16 pharmaceutically relevant targets with a total of 423 known binders. The performance of docking methods for ligand ranking was strongly system dependent. We observed that MD-based methods predominantly outperformed docking algorithms and MM/GBSA calculations. Based on our results, we recommend the application of MD-based free energy methods for prioritization of molecules for synthesis in lead optimization, whenever feasible.

Subject(s)

Algorithms , Proteins , Proteins/chemistry , Molecular Docking Simulation , Protein Binding , Thermodynamics , Ligands , Molecular Dynamics Simulation

14.

Open Force Field BespokeFit: Automating Bespoke Torsion Parametrization at Scale.

Horton, Joshua T; Boothroyd, Simon; Wagner, Jeffrey; Mitchell, Joshua A; Gokey, Trevor; Dotson, David L; Behara, Pavan Kumar; Ramaswamy, Venkata Krishnan; Mackey, Mark; Chodera, John D; Anwar, Jamshed; Mobley, David L; Cole, Daniel J.

J Chem Inf Model ; 62(22): 5622-5633, 2022 11 28.

Article in English | MEDLINE | ID: mdl-36351167

ABSTRACT

The development of accurate transferable force fields is key to realizing the full potential of atomistic modeling in the study of biological processes such as protein-ligand binding for drug discovery. State-of-the-art transferable force fields, such as those produced by the Open Force Field Initiative, use modern software engineering and automation techniques to yield accuracy improvements. However, force field torsion parameters, which must account for many stereoelectronic and steric effects, are considered to be less transferable than other force field parameters and are therefore often targets for bespoke parametrization. Here, we present the Open Force Field QCSubmit and BespokeFit software packages that, when combined, facilitate the fitting of torsion parameters to quantum mechanical reference data at scale. We demonstrate the use of QCSubmit for simplifying the process of creating and archiving large numbers of quantum chemical calculations, by generating a dataset of 671 torsion scans for druglike fragments. We use BespokeFit to derive individual torsion parameters for each of these molecules, thereby reducing the root-mean-square error in the potential energy surface from 1.1 kcal/mol, using the original transferable force field, to 0.4 kcal/mol using the bespoke version. Furthermore, we employ the bespoke force fields to compute the relative binding free energies of a congeneric series of inhibitors of the TYK2 protein, and demonstrate further improvements in accuracy, compared to the base force field (MUE reduced from 0.560.390.77 to 0.420.280.59 kcal/mol and R2 correlation improved from 0.720.350.87 to 0.930.840.97).

Subject(s)

Proteins , Software , Ligands , Proteins/chemistry , Entropy , Protein Binding

15.

Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks [Article v0.1].

Hahn, David F; Bayly, Christopher I; Macdonald, Hannah E Bruce; Chodera, John D; Mey, Antonia S J S; Mobley, David L; Benito, Laura Perez; Schindler, Christina E M; Tresadern, Gary; Warren, Gregory L.

Living J Comput Mol Sci ; 4(1)2022.

Article in English | MEDLINE | ID: mdl-36382113

ABSTRACT

Free energy calculations are rapidly becoming indispensable in structure-enabled drug discovery programs. As new methods, force fields, and implementations are developed, assessing their expected accuracy on real-world systems (benchmarking) becomes critical to provide users with an assessment of the accuracy expected when these methods are applied within their domain of applicability, and developers with a way to assess the expected impact of new methodologies. These assessments require construction of a benchmark-a set of well-prepared, high quality systems with corresponding experimental measurements designed to ensure the resulting calculations provide a realistic assessment of expected performance when these methods are deployed within their domains of applicability. To date, the community has not yet adopted a common standardized benchmark, and existing benchmark reports suffer from a myriad of issues, including poor data quality, limited statistical power, and statistically deficient analyses, all of which can conspire to produce benchmarks that are poorly predictive of real-world performance. Here, we address these issues by presenting guidelines for (1) curating experimental data to develop meaningful benchmark sets, (2) preparing benchmark inputs according to best practices to facilitate widespread adoption, and (3) analysis of the resulting predictions to enable statistically meaningful comparisons among methods and force fields. We highlight challenges and open questions that remain to be solved in these areas, as well as recommendations for the collection of new datasets that might optimally serve to measure progress as methods become systematically more reliable. Finally, we provide a curated, versioned, open, standardized benchmark set adherent to these standards (PLBenchmarks) and an open source toolkit for implementing standardized best practices assessments (arsenic) for the community to use as a standardized assessment tool. While our main focus is free energy methods based on molecular simulations, these guidelines should prove useful for assessment of the rapidly growing field of machine learning methods for affinity prediction as well.

16.

Collaborative Assessment of Molecular Geometries and Energies from the Open Force Field.

D'Amore, Lorenzo; Hahn, David F; Dotson, David L; Horton, Joshua T; Anwar, Jamshed; Craig, Ian; Fox, Thomas; Gobbi, Alberto; Lakkaraju, Sirish Kaushik; Lucas, Xavier; Meier, Katharina; Mobley, David L; Narayanan, Arjun; Schindler, Christina E M; Swope, William C; In 't Veld, Pieter J; Wagner, Jeffrey; Xue, Bai; Tresadern, Gary.

J Chem Inf Model ; 62(23): 6094-6104, 2022 Dec 12.

Article in English | MEDLINE | ID: mdl-36433835

ABSTRACT

Force fields form the basis for classical molecular simulations, and their accuracy is crucial for the quality of, for instance, protein-ligand binding simulations in drug discovery. The huge diversity of small-molecule chemistry makes it a challenge to build and parameterize a suitable force field. The Open Force Field Initiative is a combined industry and academic consortium developing a state-of-the-art small-molecule force field. In this report, industry members of the consortium worked together to objectively evaluate the performance of the force fields (referred to here as OpenFF) produced by the initiative on a combined public and proprietary dataset of 19,653 relevant molecules selected from their internal research and compound collections. This evaluation was important because it was completely blind; at most partners, none of the molecules or data were used in force field development or testing prior to this work. We compare the Open Force Field "Sage" version 2.0.0 and "Parsley" version 1.3.0 with GAFF-2.11-AM1BCC, OPLS4, and SMIRNOFF99Frosst. We analyzed force-field-optimized geometries and conformer energies compared to reference quantum mechanical data. We show that OPLS4 performs best, and the latest Open Force Field release shows a clear improvement compared to its predecessors. The performance of established force fields such as GAFF-2.11 was generally worse. While OpenFF researchers were involved in building the benchmarking infrastructure used in this work, benchmarking was done entirely in-house within industrial organizations and the resulting assessment is reported here. This work assesses the force field performance using separate benchmarking steps, external datasets, and involving external research groups. This effort may also be unique in terms of the number of different industrial partners involved, with 10 different companies participating in the benchmark efforts.

Subject(s)

Proteins , Thermodynamics , Ligands , Proteins/chemistry , Physical Phenomena

17.

An overview of the SAMPL8 host-guest binding challenge.

Amezcua, Martin; Setiadi, Jeffry; Ge, Yunhui; Mobley, David L.

J Comput Aided Mol Des ; 36(10): 707-734, 2022 10.

Article in English | MEDLINE | ID: mdl-36229622

ABSTRACT

The SAMPL series of challenges aim to focus the community on specific modeling challenges, while testing and hopefully driving progress of computational methods to help guide pharmaceutical drug discovery. In this study, we report on the results of the SAMPL8 host-guest blind challenge for predicting absolute binding affinities. SAMPL8 focused on two host-guest datasets, one involving the cucurbituril CB8 (with a series of common drugs of abuse) and another involving two different Gibb deep-cavity cavitands. The latter dataset involved a previously featured deep cavity cavitand (TEMOA) as well as a new variant (TEETOA), both binding to a series of relatively rigid fragment-like guests. Challenge participants employed a reasonably wide variety of methods, though many of these were based on molecular simulations, and predictive accuracy was mixed. As in some previous SAMPL iterations (SAMPL6 and SAMPL7), we found that one approach to achieve greater accuracy was to apply empirical corrections to the binding free energy predictions, taking advantage of prior data on binding to these hosts. Another approach which performed well was a hybrid MD-based approach with reweighting to a force matched QM potential. In the cavitand challenge, an alchemical method using the AMOEBA-polarizable force field achieved the best success with RMSE less than 1 kcal/mol, while another alchemical approach (ATM/GAFF2-AM1BCC/TIP3P/HREM) had RMSE less than 1.75 kcal/mol. The work discussed here also highlights several important lessons; for example, retrospective studies of reference calculations demonstrate the sensitivity of predicted binding free energies to ethyl group sampling and/or guest starting pose, providing guidance to help improve future studies on these systems.

Subject(s)

Molecular Dynamics Simulation , Proteins , Humans , Ligands , Thermodynamics , Protein Binding , Proteins/chemistry , Retrospective Studies , Pharmaceutical Preparations

18.

Enhancing sampling of water rehydration upon ligand binding using variants of grand canonical Monte Carlo.

Ge, Yunhui; Melling, Oliver J; Dong, Weiming; Essex, Jonathan W; Mobley, David L.

J Comput Aided Mol Des ; 36(10): 767-779, 2022 10.

Article in English | MEDLINE | ID: mdl-36198874

ABSTRACT

Water plays an important role in mediating protein-ligand interactions. Water rearrangement upon a ligand binding or modification can be very slow and beyond typical timescales used in molecular dynamics (MD) simulations. Thus, inadequate sampling of slow water motions in MD simulations often impairs the accuracy of the accuracy of ligand binding free energy calculations. Previous studies suggest grand canonical Monte Carlo (GCMC) outperforms normal MD simulations for water sampling, thus GCMC has been applied to help improve the accuracy of ligand binding free energy calculations. However, in prior work we observed protein and/or ligand motions impaired how well GCMC performs at water rehydration, suggesting more work is needed to improve this method to handle water sampling. In this work, we applied GCMC in 21 protein-ligand systems to assess the performance of GCMC for rehydrating buried water sites. While our results show that GCMC can rapidly rehydrate all selected water sites for most systems, it fails in five systems. In most failed systems, we observe protein/ligand motions, which occur in the absence of water, combine to close water sites and block instantaneous GCMC water insertion moves. For these five failed systems, we both extended our GCMC simulations and tested a new technique named grand canonical nonequilibrium candidate Monte Carlo (GCNCMC). GCNCMC combines GCMC with the nonequilibrium candidate Monte Carlo (NCMC) sampling technique to improve the probability of a successful water insertion/deletion. Our results show that GCNCMC and extended GCMC can rehydrate all target water sites for three of the five problematic systems and GCNCMC is more efficient than GCMC in two out of the three systems. In one system, only GCNCMC can rehydrate all target water sites, while GCMC fails. Both GCNCMC and GCMC fail in one system. This work suggests this new GCNCMC method is promising for water rehydration especially when protein/ligand motions may block water insertion/removal.

Subject(s)

Molecular Dynamics Simulation , Water , Water/chemistry , Ligands , Monte Carlo Method , Proteins , Fluid Therapy

19.

Absolute Binding Free Energy Calculations for Buried Water Molecules.

Ge, Yunhui; Baumann, Hannah M; Mobley, David L.

J Chem Theory Comput ; 18(11): 6482-6499, 2022 Nov 08.

Article in English | MEDLINE | ID: mdl-36197451

ABSTRACT

Water often plays a key role in mediating protein-ligand interactions. Understanding contributions from active-site water molecules to binding thermodynamics of a ligand is important in predicting binding free energies for ligand optimization. In this work, we tested a non-equilibrium switching method for absolute binding free energy calculations on water molecules in binding sites of 13 systems. We discuss the lessons we learned about identified issues that affected our calculations and ways to address them. This work fits with our larger focus on how to do accurate ligand binding free energy calculations when water rearrangements are very slow, such as rearrangements due to ligand modification (as in relative free energy calculations) or ligand binding (as in absolute free energy calculations). The method studied in this work can potentially be used to account for limited water sampling via providing endpoint corrections to free energy calculations using our calculated binding free energy of water.

Subject(s)

Molecular Dynamics Simulation , Water , Ligands , Water/chemistry , Thermodynamics , Entropy , Binding Sites , Protein Binding

20.

Improving Force Field Accuracy by Training against Condensed-Phase Mixture Properties.

Boothroyd, Simon; Madin, Owen C; Mobley, David L; Wang, Lee-Ping; Chodera, John D; Shirts, Michael R.

J Chem Theory Comput ; 18(6): 3577-3592, 2022 Jun 14.

Article in English | MEDLINE | ID: mdl-35533269

ABSTRACT

Developing a sufficiently accurate classical force field representation of molecules is key to realizing the full potential of molecular simulations as a route to gaining a fundamental insight into a broad spectrum of chemical and biological phenomena. This is only possible, however, if the many complex interactions between molecules of different species in the system are accurately captured by the model. Historically, the intermolecular van der Waals (vdW) interactions have primarily been trained against densities and enthalpies of vaporization of pure (single-component) systems, with occasional usage of hydration free energies. In this study, we demonstrate how including physical property data of binary mixtures can better inform these parameters, encoding more information about the underlying physics of the system in complex chemical mixtures. To demonstrate this, we retrain a select number of Lennard-Jones parameters describing the vdW interactions of the OpenFF 1.0.0 (Parsley) fixed charge force field against training sets composed of densities and enthalpies of mixing for binary liquid mixtures as well as densities and enthalpies of vaporization of pure liquid systems and assess the performance of each of these combinations. We show that retraining against the mixture data improves the force field's ability to reproduce mixture properties, including solvation free energies, correcting some systematic errors that exist when training vdW interactions against properties of pure systems only.

Subject(s)

Thermodynamics

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL