Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 136
Filter
1.
Chem Sci ; 15(32): 12861-12878, 2024 Aug 14.
Article in English | MEDLINE | ID: mdl-39148808

ABSTRACT

The development of reliable and extensible molecular mechanics (MM) force fields-fast, empirical models characterizing the potential energy surface of molecular systems-is indispensable for biomolecular simulation and computer-aided drug design. Here, we introduce a generalized and extensible machine-learned MM force field, espaloma-0.3, and an end-to-end differentiable framework using graph neural networks to overcome the limitations of traditional rule-based methods. Trained in a single GPU-day to fit a large and diverse quantum chemical dataset of over 1.1 M energy and force calculations, espaloma-0.3 reproduces quantum chemical energetic properties of chemical domains highly relevant to drug discovery, including small molecules, peptides, and nucleic acids. Moreover, this force field maintains the quantum chemical energy-minimized geometries of small molecules and preserves the condensed phase properties of peptides and folded proteins, self-consistently parametrizing proteins and ligands to produce stable simulations leading to highly accurate predictions of binding free energies. This methodology demonstrates significant promise as a path forward for systematically building more accurate force fields that are easily extensible to new chemical domains of interest.

2.
J Phys Chem B ; 128(32): 7888-7902, 2024 Aug 15.
Article in English | MEDLINE | ID: mdl-39087913

ABSTRACT

A wide range of density functional methods and basis sets are available to derive the electronic structure and properties of molecules. Quantum mechanical calculations are too computationally intensive for routine simulation of molecules in the condensed phase, prompting the development of computationally efficient force fields based on quantum mechanical data. Parametrizing general force fields, which cover a vast chemical space, necessitates the generation of sizable quantum mechanical data sets with optimized geometries and torsion scans. To achieve this efficiently, choosing a quantum mechanical method that balances computational cost and accuracy is crucial. In this study, we seek to assess the accuracy of quantum mechanical theory for specific properties such as conformer energies and torsion energetics. To comprehensively evaluate various methods, we focus on a representative set of 59 diverse small molecules, comparing approximately 25 combinations of functional and basis sets against the reference level coupled cluster calculations at the complete basis set limit.

3.
J Phys Chem B ; 128(29): 7043-7067, 2024 Jul 25.
Article in English | MEDLINE | ID: mdl-38989715

ABSTRACT

Force fields are a key component of physics-based molecular modeling, describing the energies and forces in a molecular system as a function of the positions of the atoms and molecules involved. Here, we provide a review and scientific status report on the work of the Open Force Field (OpenFF) Initiative, which focuses on the science, infrastructure and data required to build the next generation of biomolecular force fields. We introduce the OpenFF Initiative and the related OpenFF Consortium, describe its approach to force field development and software, and discuss accomplishments to date as well as future plans. OpenFF releases both software and data under open and permissive licensing agreements to enable rapid application, validation, extension, and modification of its force fields and software tools. We discuss lessons learned to date in this new approach to force field development. We also highlight ways that other force field researchers can get involved, as well as some recent successes of outside researchers taking advantage of OpenFF tools and data.

4.
J Chem Inf Model ; 64(13): 5063-5076, 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38895959

ABSTRACT

In drug discovery, the in silico prediction of binding affinity is one of the major means to prioritize compounds for synthesis. Alchemical relative binding free energy (RBFE) calculations based on molecular dynamics (MD) simulations are nowadays a popular approach for the accurate affinity ranking of compounds. MD simulations rely on empirical force field parameters, which strongly influence the accuracy of the predicted affinities. Here, we evaluate the ability of six different small-molecule force fields to predict experimental protein-ligand binding affinities in RBFE calculations on a set of 598 ligands and 22 protein targets. The public force fields OpenFF Parsley and Sage, GAFF, and CGenFF show comparable accuracy, while OPLS3e is significantly more accurate. However, a consensus approach using Sage, GAFF, and CGenFF leads to accuracy comparable to OPLS3e. While Parsley and Sage are performing comparably based on aggregated statistics across the whole dataset, there are differences in terms of outliers. Analysis of the force field reveals that improved parameters lead to significant improvement in the accuracy of affinity predictions on subsets of the dataset involving those parameters. Lower accuracy can not only be attributed to the force field parameters but is also dependent on input preparation and sampling convergence of the calculations. Especially large perturbations and nonconverged simulations lead to less accurate predictions. The input structures, Gromacs force field files, as well as the analysis Python notebooks are available on GitHub.


Subject(s)
Molecular Dynamics Simulation , Protein Binding , Proteins , Thermodynamics , Ligands , Proteins/chemistry , Proteins/metabolism , Drug Discovery/methods , Protein Conformation
5.
J Chem Inf Model ; 64(12): 4661-4672, 2024 Jun 24.
Article in English | MEDLINE | ID: mdl-38860710

ABSTRACT

DNA-encoded library technology grants access to nearly infinite opportunities to explore the chemical structure space for drug discovery. Successful navigation depends on the design and synthesis of libraries with appropriate physicochemical properties (PCPs) and structural diversity while aligning with practical considerations. To this end, we analyze combinatorial library design constraints including the number of chemistry cycles, bond construction strategies, and building block (BB) class selection in pursuit of ideal library designs. We compare two-cycle library designs (amino acid + carboxylic acid, primary amine + carboxylic acid) in the context of PCPs and chemical space coverage, given different BB selection strategies and constraints. We find that broad availability of amines and acids is essential for enabling the widest exploration of chemical space. Surprisingly, cost is not a driving factor, and virtually, the same chemical space can be explored with "budget" BBs.


Subject(s)
DNA , Small Molecule Libraries , DNA/chemistry , Small Molecule Libraries/chemistry , Drug Discovery/methods , Combinatorial Chemistry Techniques , Drug Design , Amines/chemistry , Carboxylic Acids/chemistry , Gene Library
6.
J Comput Chem ; 45(23): 2024-2033, 2024 Sep 05.
Article in English | MEDLINE | ID: mdl-38725239

ABSTRACT

In binding free energy calculations, simulations must sample all relevant conformations of the system in order to obtain unbiased results. For instance, different ligands can bind to different metastable states of a protein, and if these protein conformational changes are not sampled in relative binding free energy calculations, the contribution of these states to binding is not accounted for and thus calculated binding free energies are inaccurate. In this work, we investigate the impact of different beta-sectretase 1 (BACE1) protein conformations obtained from x-ray crystallography on the binding of BACE1 inhibitors. We highlight how these conformational changes are not adequately sampled in typical molecular dynamics simulations. Furthermore, we show that insufficient sampling of relevant conformations induces substantial error in relative binding free energy calculations, as judged by a variation in calculated relative binding free energies up to 2 kcal/mol depending on the starting protein conformation. These results emphasize the importance of protein conformational sampling and pose this BACE1 system as a challenge case for further method development in the area of enhanced protein conformational sampling, either in combination with binding calculations or as an endpoint correction.


Subject(s)
Amyloid Precursor Protein Secretases , Aspartic Acid Endopeptidases , Molecular Dynamics Simulation , Protein Binding , Protein Conformation , Thermodynamics , Amyloid Precursor Protein Secretases/chemistry , Amyloid Precursor Protein Secretases/metabolism , Amyloid Precursor Protein Secretases/antagonists & inhibitors , Aspartic Acid Endopeptidases/chemistry , Aspartic Acid Endopeptidases/antagonists & inhibitors , Aspartic Acid Endopeptidases/metabolism , Humans , Crystallography, X-Ray , Ligands
7.
Phys Chem Chem Phys ; 26(12): 9207-9225, 2024 Mar 20.
Article in English | MEDLINE | ID: mdl-38444308

ABSTRACT

We report the results of the SAMPL9 host-guest blind challenge for predicting binding free energies. The challenge focused on macrocycles from pillar[n]-arene and cyclodextrin host families, including WP6, and bCD and HbCD. A variety of methods were used by participants to submit binding free energy predictions. A machine learning approach based on molecular descriptors achieved the highest accuracy (RMSE of 2.04 kcal mol-1) among the ranked methods in the WP6 dataset. Interestingly, predictions for WP6 obtained via docking tended to outperform all methods (RMSE of 1.70 kcal mol-1), most of which are MD based and computationally more expensive. In general, methods applying force fields achieved better correlation with experiments for WP6 opposed to the machine learning and docking models. In the cyclodextrin-phenothiazine challenge, the ATM approach emerged as the top performing method with RMSE less than 1.86 kcal mol-1. Correlation metrics of ranked methods in this dataset were relatively poor compared to WP6. We also highlight several lessons learned to guide future work and help improve studies on the systems discussed. For example, WP6 may be present in other microstates other than its -12 state in the presence of certain guests. Machine learning approaches can be used to fine tune or help train force fields for certain chemistry (i.e. WP6-G4). Certain phenothiazines occupy distinct primary and secondary orientations, some of which were considered individually for accurate binding free energies. The accuracy of predictions from certain methods while starting from a single binding pose/orientation demonstrates the sensitivity of calculated binding free energies to the orientation, and in some cases the likely dominant orientation for the system. Computational and experimental results suggest that guest phenothiazine core traverses both the secondary and primary faces of the cyclodextrin hosts, a bulky cationic side chain will primarily occupy the primary face, and the phenothiazine core substituent resides at the larger secondary face.

8.
J Chem Theory Comput ; 20(3): 1036-1050, 2024 Feb 13.
Article in English | MEDLINE | ID: mdl-38291966

ABSTRACT

Obtaining accurate binding free energies from in silico screens has been a long-standing goal for the computational chemistry community. However, accuracy and computational cost are at odds with one another, limiting the utility of methods that perform this type of calculation. Many methods achieve massive scale by explicitly or implicitly assuming that the target protein adopts a single structure, or undergoes limited fluctuations around that structure, to minimize computational cost. Others simulate each protein-ligand complex of interest, accepting lower throughput in exchange for better predictions of binding affinities. Here, we present the PopShift framework for accounting for the ensemble of structures a protein adopts and their relative probabilities. Protein degrees of freedom are enumerated once, and then arbitrarily many molecules can be screened against this ensemble. Specifically, we use Markov state models (MSMs) as a compressed representation of a protein's thermodynamic ensemble. We start with a ligand-free MSM and then calculate how addition of a ligand shifts the populations of each protein conformational state based on the strength of the interaction between that protein conformation and the ligand. In this work we use docking to estimate the affinity between a given protein structure and ligand, but any estimator of binding affinities could be used in the PopShift framework. We test PopShift on the classic benchmark pocket T4 Lysozyme L99A. We find that PopShift is more accurate than common strategies, such as docking to a single structure and traditional ensemble docking─producing results that compare favorably with alchemical binding free energy calculations in terms of RMSE but not correlation─and may have a more favorable computational cost profile in some applications. In addition to predicting binding free energies and ligand poses, PopShift also provides insight into how the probability of different protein structures is shifted upon addition of various concentrations of ligand, providing a platform for predicting affinities and allosteric effects of ligand binding. Therefore, we expect PopShift will be valuable for hit finding and for providing insight into phenomena like allostery.


Subject(s)
Proteins , Protein Binding , Ligands , Proteins/chemistry , Entropy , Protein Conformation , Thermodynamics , Binding Sites
9.
J Chem Theory Comput ; 20(3): 1293-1305, 2024 Feb 13.
Article in English | MEDLINE | ID: mdl-38240687

ABSTRACT

We present an efficient polarizable electrostatic model, utilizing typed, atom-centered polarizabilities and the fast direct approximation, designed for efficient use in molecular dynamics (MD) simulations. The model provides two convenient approaches for assigning partial charges in the context of atomic polarizabilities. One is a generalization of RESP, called RESP-dPol, and the other, AM1-BCC-dPol, is an adaptation of the widely used AM1-BCC method. Both are designed to accurately replicate gas-phase quantum mechanical electrostatic potentials. Benchmarks of this polarizable electrostatic model against gas-phase dipole moments, molecular polarizabilities, bulk liquid densities, and static dielectric constants of organic liquids show good agreement with the reference values. Of note, the model yields markedly more accurate dielectric constants of organic liquids, relative to a matched nonpolarizable force field. MD simulations with this method, which is currently parametrized for molecules containing elements C, N, O, and H, run only about 3.6-fold slower than fixed charge force fields, while simulations with the self-consistent mutual polarization average 4.5-fold slower. Our results suggest that RESP-dPol and AM1-BCC-dPol afford improved accuracy relative to fixed charge force fields and are good starting points for developing general, affordable, and transferable polarizable force fields. The software implementing these approaches has been designed to utilize the force field fitting frameworks developed and maintained by the Open Force Field Initiative, setting the stage for further exploration of this approach to polarizable force field development.

10.
Digit Discov ; 2(4): 1178-1187, 2023 Aug 08.
Article in English | MEDLINE | ID: mdl-38013814

ABSTRACT

The Lennard-Jones potential is the most widely-used function for the description of non-bonded interactions in transferable force fields for the condensed phase. This is not because it has an optimal functional form, but rather it is a legacy resulting from when computational expense was a major consideration and this potential was particularly convenient numerically. At present, it persists because the effort that would be required to re-write molecular modelling software and train new force fields has, until now, been prohibitive. Here, we present Smirnoff-plugins as a flexible framework to extend the Open Force Field software stack to allow custom force field functional forms. We deploy Smirnoff-plugins with the automated Open Force Field infrastructure to train a transferable, small molecule force field based on the recently-proposed double exponential functional form, on over 1000 experimental condensed phase properties. Extensive testing of the resulting force field shows improvements in transfer free energies, with acceptable conformational energetics, run times and convergence properties compared to state-of-the-art Lennard-Jones based force fields.

11.
J Chem Inf Model ; 63(16): 5120-5132, 2023 08 28.
Article in English | MEDLINE | ID: mdl-37578123

ABSTRACT

DNA-encoded libraries (DELs) provide the means to make and screen millions of diverse compounds against a target of interest in a single experiment. However, despite producing large volumes of binding data at a relatively low cost, the DEL selection process is susceptible to noise, necessitating computational follow-up to increase signal-to-noise ratios. In this work, we present a set of informatics tools to employ data from prior DEL screen(s) to gain information about which building blocks are most likely to be productive when designing new DELs for the same target. We demonstrate that similar building blocks have similar probabilities of forming compounds that bind. We then build a model from the inference that the combined behavior of individual building blocks is predictive of whether an overall compound binds. We illustrate our approach on a set of three-cycle OpenDEL libraries screened against soluble epoxide hydrolase (sEH) and report performance of more than an order of magnitude greater than random guessing on a holdout set, demonstrating that our model can serve as a baseline for comparison against other machine learning models on DEL data. Lastly, we provide a discussion on how we believe this informatics workflow could be applied to benefit researchers in their specific DEL campaigns.


Subject(s)
Drug Discovery , Small Molecule Libraries , Small Molecule Libraries/chemistry , DNA/chemistry , Machine Learning
12.
bioRxiv ; 2023 Aug 08.
Article in English | MEDLINE | ID: mdl-37503302

ABSTRACT

Obtaining accurate binding free energies from in silico screens has been a longstanding goal for the computational chemistry community. However, accuracy and computational cost are at odds with one another, limiting the utility of methods that perform this type of calculation. Many methods achieve massive scale by explicitly or implicitly assuming that the target protein adopts a single structure, or undergoes limited fluctuations around that structure, to minimize computational cost. Others simulate each protein-ligand complex of interest, accepting lower throughput in exchange for better predictions of binding affinities. Here, we present the PopShift framework for accounting for the ensemble of structures a protein adopts and their relative probabilities. Protein degrees of freedom are enumerated once, and then arbitrarily many molecules can be screened against this ensemble. Specifically, we use Markov state models (MSMs) as a compressed representation of a protein's thermodynamic ensemble. We start with a ligand-free MSM and then calculate how addition of a ligand shifts the populations of each protein conformational state based on the strength of the interaction between that protein conformation and the ligand. In this work we use docking to estimate the affinity between a given protein structure and ligand, but any estimator of binding affinities could be used in the PopShift framework. We test PopShift on the classic benchmark pocket T4 Lysozyme L99A. We find that PopShift is more accurate than common strategies, such as docking to a single structure and traditional ensemble docking-producing results that compare favorably with alchemical binding free energy calculations in terms of RMSE but not correlation - and may have a more favorable computational cost profile in some applications. In addition to predicting binding free energies and ligand poses, PopShift also provides insight into how the probability of different protein structures is shifted upon addition of various concentrations of ligand, providing a platform for predicting affinities and allosteric effects of ligand binding. Therefore, we expect PopShift will be valuable for hit finding and for providing insight into phenomena like allostery.

13.
J Chem Theory Comput ; 19(15): 5058-5076, 2023 Aug 08.
Article in English | MEDLINE | ID: mdl-37487138

ABSTRACT

Binding free energy calculations predict the potency of compounds to protein binding sites in a physically rigorous manner and see broad application in prioritizing the synthesis of novel drug candidates. Relative binding free energy (RBFE) calculations have emerged as an industry-standard approach to achieve highly accurate rank-order predictions of the potency of related compounds; however, this approach requires that the ligands share a common scaffold and a common binding mode, restricting the methods' domain of applicability. This is a critical limitation since complex modifications to the ligands, especially core hopping, are very common in drug design. Absolute binding free energy (ABFE) calculations are an alternate method that can be used for ligands that are not congeneric. However, ABFE suffers from a known problem of long convergence times due to the need to sample additional degrees of freedom within each system, such as sampling rearrangements necessary to open and close the binding site. Here, we report on an alternative method for RBFE, called Separated Topologies (SepTop), which overcomes the issues in both of the aforementioned methods by enabling large scaffold changes between ligands with a convergence time comparable to traditional RBFE. Instead of only mutating atoms that vary between two ligands, this approach performs two absolute free energy calculations at the same time in opposite directions, one for each ligand. Defining the two ligands independently allows the comparison of the binding of diverse ligands without the artificial constraints of identical poses or a suitable atom-atom mapping. This approach also avoids the need to sample the unbound state of the protein, making it more efficient than absolute binding free energy calculations. Here, we introduce an implementation of SepTop. We developed a general and efficient protocol for running SepTop, and we demonstrated the method on four diverse, pharmaceutically relevant systems. We report the performance of the method, as well as our practical insights into the strengths, weaknesses, and challenges of applying this method in an industrial drug design setting. We find that the accuracy of the approach is sufficiently high to rank order ligands with an accuracy comparable to traditional RBFE calculations while maintaining the additional flexibility of SepTop.

14.
J Chem Theory Comput ; 19(11): 3251-3275, 2023 Jun 13.
Article in English | MEDLINE | ID: mdl-37167319

ABSTRACT

We introduce the Open Force Field (OpenFF) 2.0.0 small molecule force field for drug-like molecules, code-named Sage, which builds upon our previous iteration, Parsley. OpenFF force fields are based on direct chemical perception, which generalizes easily to highly diverse sets of chemistries based on substructure queries. Like the previous OpenFF iterations, the Sage generation of OpenFF force fields was validated in protein-ligand simulations to be compatible with AMBER biopolymer force fields. In this work, we detail the methodology used to develop this force field, as well as the innovations and improvements introduced since the release of Parsley 1.0.0. One particularly significant feature of Sage is a set of improved Lennard-Jones (LJ) parameters retrained against condensed phase mixture data, the first refit of LJ parameters in the OpenFF small molecule force field line. Sage also includes valence parameters refit to a larger database of quantum chemical calculations than previous versions, as well as improvements in how this fitting is performed. Force field benchmarks show improvements in general metrics of performance against quantum chemistry reference data such as root-mean-square deviations (RMSD) of optimized conformer geometries, torsion fingerprint deviations (TFD), and improved relative conformer energetics (ΔΔE). We present a variety of benchmarks for these metrics against our previous force fields as well as in some cases other small molecule force fields. Sage also demonstrates improved performance in estimating physical properties, including comparison against experimental data from various thermodynamic databases for small molecule properties such as ΔHmix, ρ(x), ΔGsolv, and ΔGtrans. Additionally, we benchmarked against protein-ligand binding free energies (ΔGbind), where Sage yields results statistically similar to previous force fields. All the data is made publicly available along with complete details on how to reproduce the training results at https://github.com/openforcefield/openff-sage.


Subject(s)
Benchmarking , Proteins , Ligands , Proteins/chemistry , Thermodynamics , Entropy
15.
J Chem Inf Model ; 63(6): 1776-1793, 2023 03 27.
Article in English | MEDLINE | ID: mdl-36878475

ABSTRACT

Drug discovery is accelerated with computational methods such as alchemical simulations to estimate ligand affinities. In particular, relative binding free energy (RBFE) simulations are beneficial for lead optimization. To use RBFE simulations to compare prospective ligands in silico, researchers first plan the simulation experiment, using graphs where nodes represent ligands and graph edges represent alchemical transformations between ligands. Recent work demonstrated that optimizing the statistical architecture of these perturbation graphs improves the accuracy of the predicted changes in the free energy of ligand binding. Therefore, to improve the success rate of computational drug discovery, we present the open-source software package High Information Mapper (HiMap)─a new take on its predecessor, Lead Optimization Mapper (LOMAP). HiMap removes heuristics decisions from design selection and instead finds statistically optimal graphs over ligands clustered with machine learning. Beyond optimal design generation, we present theoretical insights for designing alchemical perturbation maps. Some of these results include that for n number of nodes, the precision of perturbation maps is stable at n·ln(n) edges. This result indicates that even an "optimal" graph can result in unexpectedly high errors if a plan includes too few alchemical transformations for the given number of ligands and edges. And, as a study compares more ligands, the performance of even optimal graphs will deteriorate with linear scaling of the edge count. In this sense, ensuring an A- or D-optimal topology is not enough to produce robust errors. We additionally find that optimal designs will converge more rapidly than radial and LOMAP designs. Moreover, we derive bounds for how clustering reduces cost for designs with a constant expected relative error per cluster, invariant of the size of the design. These results inform how to best design perturbation maps for computational drug discovery and have broader implications for experimental design.


Subject(s)
Molecular Dynamics Simulation , Thermodynamics , Ligands , Prospective Studies , Entropy , Protein Binding
16.
Acta Crystallogr D Struct Biol ; 79(Pt 1): 50-65, 2023 Jan 01.
Article in English | MEDLINE | ID: mdl-36601807

ABSTRACT

It is investigated whether molecular-dynamics (MD) simulations can be used to enhance macromolecular crystallography (MX) studies. Historically, protein crystal structures have been described using a single set of atomic coordinates. Because conformational variation is important for protein function, researchers now often build models that contain multiple structures. Methods for building such models can fail, however, in regions where the crystallographic density is difficult to interpret, for example at the protein-solvent interface. To address this limitation, a set of MD-MX methods that combine MD simulations of protein crystals with conventional modeling and refinement tools have been developed. In an application to a cyclic adenosine monophosphate-dependent protein kinase at room temperature, the procedure improved the interpretation of ambiguous density, yielding an alternative water model and a revised protein model including multiple conformations. The revised model provides mechanistic insights into the catalytic and regulatory interactions of the enzyme. The same methods may be used in other MX studies to seek mechanistic insights.


Subject(s)
Molecular Dynamics Simulation , Proteins , Protein Conformation , Proteins/chemistry , Solvents/chemistry , Crystallography, X-Ray
17.
J Chem Theory Comput ; 19(3): 1050-1062, 2023 Feb 14.
Article in English | MEDLINE | ID: mdl-36692215

ABSTRACT

Water molecules play a key role in many biomolecular systems, particularly when bound at protein-ligand interfaces. However, molecular simulation studies on such systems are hampered by the relatively long time scales over which water exchange between a protein and solvent takes place. Grand canonical Monte Carlo (GCMC) is a simulation technique that avoids this issue by attempting the insertion and deletion of water molecules within a given structure. The approach is constrained by low acceptance probabilities for insertions in congested systems, however. To address this issue, here, we combine GCMC with nonequilibium candidate Monte Carlo (NCMC) to yield a method that we refer to as grand canonical nonequilibrium candidate Monte Carlo (GCNCMC), in which the water insertions and deletions are carried out in a gradual, nonequilibrium fashion. We validate this new approach by comparing GCNCMC and GCMC simulations of bulk water and three protein binding sites. We find that not only is the efficiency of the water sampling improved by GCNCMC but that it also results in increased sampling of ligand conformations in a protein binding site, revealing new water-mediated ligand-binding geometries that are not observed using alternative enhanced sampling techniques.

18.
ChemMedChem ; 18(1): e202200425, 2023 01 03.
Article in English | MEDLINE | ID: mdl-36240514

ABSTRACT

Prioritizing molecules for synthesis is a key role of computational methods within medicinal chemistry. Multiple tools exist for ranking molecules, from the cheap and popular molecular docking methods to more computationally expensive molecular-dynamics (MD)-based methods. It is often questioned whether the accuracy of the more rigorous methods justifies the higher computational cost and associated calculation time. Here, we compared the performance on ranking the binding of small molecules for seven scoring functions from five docking programs, one end-point method (MM/GBSA), and two MD-based free energy methods (PMX, FEP+). We investigated 16 pharmaceutically relevant targets with a total of 423 known binders. The performance of docking methods for ligand ranking was strongly system dependent. We observed that MD-based methods predominantly outperformed docking algorithms and MM/GBSA calculations. Based on our results, we recommend the application of MD-based free energy methods for prioritization of molecules for synthesis in lead optimization, whenever feasible.


Subject(s)
Algorithms , Proteins , Proteins/chemistry , Molecular Docking Simulation , Protein Binding , Thermodynamics , Ligands , Molecular Dynamics Simulation
19.
J Chem Inf Model ; 62(23): 6094-6104, 2022 Dec 12.
Article in English | MEDLINE | ID: mdl-36433835

ABSTRACT

Force fields form the basis for classical molecular simulations, and their accuracy is crucial for the quality of, for instance, protein-ligand binding simulations in drug discovery. The huge diversity of small-molecule chemistry makes it a challenge to build and parameterize a suitable force field. The Open Force Field Initiative is a combined industry and academic consortium developing a state-of-the-art small-molecule force field. In this report, industry members of the consortium worked together to objectively evaluate the performance of the force fields (referred to here as OpenFF) produced by the initiative on a combined public and proprietary dataset of 19,653 relevant molecules selected from their internal research and compound collections. This evaluation was important because it was completely blind; at most partners, none of the molecules or data were used in force field development or testing prior to this work. We compare the Open Force Field "Sage" version 2.0.0 and "Parsley" version 1.3.0 with GAFF-2.11-AM1BCC, OPLS4, and SMIRNOFF99Frosst. We analyzed force-field-optimized geometries and conformer energies compared to reference quantum mechanical data. We show that OPLS4 performs best, and the latest Open Force Field release shows a clear improvement compared to its predecessors. The performance of established force fields such as GAFF-2.11 was generally worse. While OpenFF researchers were involved in building the benchmarking infrastructure used in this work, benchmarking was done entirely in-house within industrial organizations and the resulting assessment is reported here. This work assesses the force field performance using separate benchmarking steps, external datasets, and involving external research groups. This effort may also be unique in terms of the number of different industrial partners involved, with 10 different companies participating in the benchmark efforts.


Subject(s)
Proteins , Thermodynamics , Ligands , Proteins/chemistry , Physical Phenomena
20.
J Chem Inf Model ; 62(22): 5622-5633, 2022 11 28.
Article in English | MEDLINE | ID: mdl-36351167

ABSTRACT

The development of accurate transferable force fields is key to realizing the full potential of atomistic modeling in the study of biological processes such as protein-ligand binding for drug discovery. State-of-the-art transferable force fields, such as those produced by the Open Force Field Initiative, use modern software engineering and automation techniques to yield accuracy improvements. However, force field torsion parameters, which must account for many stereoelectronic and steric effects, are considered to be less transferable than other force field parameters and are therefore often targets for bespoke parametrization. Here, we present the Open Force Field QCSubmit and BespokeFit software packages that, when combined, facilitate the fitting of torsion parameters to quantum mechanical reference data at scale. We demonstrate the use of QCSubmit for simplifying the process of creating and archiving large numbers of quantum chemical calculations, by generating a dataset of 671 torsion scans for druglike fragments. We use BespokeFit to derive individual torsion parameters for each of these molecules, thereby reducing the root-mean-square error in the potential energy surface from 1.1 kcal/mol, using the original transferable force field, to 0.4 kcal/mol using the bespoke version. Furthermore, we employ the bespoke force fields to compute the relative binding free energies of a congeneric series of inhibitors of the TYK2 protein, and demonstrate further improvements in accuracy, compared to the base force field (MUE reduced from 0.560.390.77 to 0.420.280.59 kcal/mol and R2 correlation improved from 0.720.350.87 to 0.930.840.97).


Subject(s)
Proteins , Software , Ligands , Proteins/chemistry , Entropy , Protein Binding
SELECTION OF CITATIONS
SEARCH DETAIL