RESUMO
The use of fast in silico prediction methods for protein-ligand binding free energies holds significant promise for the initial phases of drug development. Numerous traditional physics-based models (e.g., implicit solvent models), however, tend to either neglect or heavily approximate entropic contributions to binding due to their computational complexity. Consequently, such methods often yield imprecise assessments of binding strength. Machine learning models provide accurate predictions and can often outperform physics-based models. They, however, are often prone to overfitting, and the interpretation of their results can be difficult. Physics-guided machine learning models combine the consistency of physics-based models with the accuracy of modern data-driven algorithms. This work integrates physics-based model conformational entropies into a graph convolutional network. We introduce a new neural network architecture (a rule-based graph convolutional network) that generates molecular fingerprints according to predefined rules specifically optimized for binding free energy calculations. Our results on 100 small host-guest systems demonstrate significant improvements in convergence and preventing overfitting. We additionally demonstrate the transferability of our proposed hybrid model by training it on the aforementioned host-guest systems and then testing it on six unrelated protein-ligand systems. Our new model shows little difference in training set accuracy compared to a previous model but an order-of-magnitude improvement in test set accuracy. Finally, we show how the results of our hybrid model can be interpreted in a straightforward fashion.
Assuntos
Entropia , Ligação Proteica , Proteínas , Ligantes , Proteínas/química , Proteínas/metabolismo , Redes Neurais de Computação , Aprendizado de MáquinaRESUMO
AmberTools is a free and open-source collection of programs used to set up, run, and analyze molecular simulations. The newer features contained within AmberTools23 are briefly described in this Application note.
RESUMO
The binding free energy calculation of protein-ligand complexes is necessary for research into virus-host interactions and the relevant applications in drug discovery. However, many current computational methods of such calculations are either inefficient or inaccurate in practice. Utilizing implicit solvent models in the molecular mechanics generalized Born surface area (MM/GBSA) framework allows for efficient calculations without significant loss of accuracy. Here, GBNSR6, a new flavor of the generalized Born model, is employed in the MM/GBSA framework for measuring the binding affinity between SARS-CoV-2 spike protein and the human ACE2 receptor. A computational protocol is developed based on the widely studied Ras-Raf complex, which has similar binding free energy to SARS-CoV-2/ACE2. Two options for representing the dielectric boundary of the complexes are evaluated: one based on the standard Bondi radii and the other based on a newly developed set of atomic radii (OPT1), optimized specifically for protein-ligand binding. Predictions based on the two radii sets provide upper and lower bounds on the experimental references: -14.7(ΔGbindBondi)<-10.6(ΔGbindExp.)<-4.1(ΔGbindOPT1) kcal/mol. The consensus estimates of the two bounds show quantitative agreement with the experiment values. This work also presents a novel truncation method and computational strategies for efficient entropy calculations with normal mode analysis. Interestingly, it is observed that a significant decrease in the number of snapshots does not affect the accuracy of entropy calculation, while it does lower computation time appreciably. The proposed MM/GBSA protocol can be used to study the binding mechanism of new variants of SARS-CoV-2, as well as other relevant structures.
Assuntos
Enzima de Conversão de Angiotensina 2/metabolismo , SARS-CoV-2/metabolismo , Glicoproteína da Espícula de Coronavírus/metabolismo , Algoritmos , Enzima de Conversão de Angiotensina 2/química , COVID-19/patologia , COVID-19/virologia , Entropia , Humanos , Ligantes , Simulação de Dinâmica Molecular , Ligação Proteica , SARS-CoV-2/isolamento & purificação , Glicoproteína da Espícula de Coronavírus/química , Quinases raf/química , Quinases raf/metabolismo , Proteínas ras/química , Proteínas ras/metabolismoRESUMO
Fast and accurate calculation of solvation free energies is central to many applications, such as rational drug design. In this study, we present a grid-based molecular surface implementation of "R6" flavor of the generalized Born (GB) implicit solvent model, named GBNSR6. The speed, accuracy relative to numerical Poisson-Boltzmann treatment, and sensitivity to grid surface parameters are tested on a set of 15 small protein-ligand complexes and a set of biomolecules in the range of 268 to 25099 atoms. Our results demonstrate that the proposed model provides a relatively successful compromise between the speed and accuracy of computing polar components of the solvation free energies (ΔGpol) and binding free energies (ΔΔGpol). The model tolerates a relatively coarse grid size h = 0.5 Å, where the grid artifact error in computing ΔΔGpol remains in the range of kBT â¼ 0.6 kcal/mol. The estimated ΔΔGpols are well correlated (r2 = 0.97) with the numerical Poisson-Boltzmann reference, while showing virtually no systematic bias and RMSE = 1.43 kcal/mol. The grid-based GBNSR6 model is available in Amber (AmberTools) package of molecular simulation programs.
Assuntos
Complexos de Coordenação/química , Modelos Químicos , Proteínas/química , Eletricidade Estática , Termodinâmica , Ligantes , Solubilidade , Solventes/químicaRESUMO
Adaptive steered molecular dynamics (ASMD) is a computational biophysics method in which an external force is applied to a selected set of atoms or a specific reaction coordinate to induce a particular molecular motion. Virtual reality (VR) based methods for protein-ligand docking are beneficial for visualizing on-the-fly interactive molecular dynamics and performing promising docking trajectories. In this paper, we propose a novel method to guide ASMD with optimal trajectories collected from human experiences using interactive molecular dynamics in virtual reality (iMD-VR). We also explain the benefits of using VR as a tool for expediting the process of ligand binding, outlining an experimental protocol that enables iMD-VR users to guide Amprenavir into and out of the binding pockets of HIV-1 protease and recreate their respective crystallographic binding poses within 5 minutes. Later, we discuss our analysis of the results from iMD-VR-assisted ASMD simulation and assess its performance compared to a standard ASMD simulation. From the accuracy point of view, our proposed method calculates higher Potential Mean Force (PMF) values consistently relative to a standard ASMD simulation with an almost twofold increase in all the experiments. Finally, we describe the novelty of the research and discuss results showcasing a faster and more effective convergence of the ligand to the protein's binding site as compared to a standard molecular dynamics simulation, proving the effectiveness of VR in the field of drug discovery. Future work includes the development of an artificial intelligence algorithm capable of predicting optimal binding trajectories for many protein-ligand pairs, as well as the required force needed to steer the ligand to follow the said trajectory.
Assuntos
Inteligência Artificial , Realidade Virtual , Humanos , Simulação de Acoplamento Molecular , Ligantes , Gráficos por Computador , Proteínas , PercepçãoRESUMO
The accuracy of computational models of water is key to atomistic simulations of biomolecules. We propose a computationally efficient way to improve the accuracy of the prediction of hydration-free energies (HFEs) of small molecules: the remaining errors of the physics-based models relative to the experiment are predicted and mitigated by machine learning (ML) as a postprocessing step. Specifically, the trained graph convolutional neural network attempts to identify the "blind spots" in the physics-based model predictions, where the complex physics of aqueous solvation is poorly accounted for, and partially corrects for them. The strategy is explored for five classical solvent models representing various accuracy/speed trade-offs, from the fast analytical generalized Born (GB) to the popular TIP3P explicit solvent model; experimental HFEs of small neutral molecules from the FreeSolv set are used for the training and testing. For all of the models, the ML correction reduces the resulting root-mean-square error relative to the experiment for HFEs of small molecules, without significant overfitting and with negligible computational overhead. For example, on the test set, the relative accuracy improvement is 47% for the fast analytical GB, making it, after the ML correction, almost as accurate as uncorrected TIP3P. For the TIP3P model, the accuracy improvement is about 39%, bringing the ML-corrected model's accuracy below the 1 kcal/mol threshold. In general, the relative benefit of the ML corrections is smaller for more accurate physics-based models, reaching the lower limit of about 20% relative accuracy gain compared with that of the physics-based treatment alone. The proposed strategy of using ML to learn the remaining error of physics-based models offers a distinct advantage over training ML alone directly on reference HFEs: it preserves the correct overall trend, even well outside of the training set.
RESUMO
Computational structural biology has demonstrated a key role in improving human health [...].
Assuntos
Biologia Computacional , Proteínas , Humanos , Simulação por ComputadorRESUMO
Structure-based drug discovery aims to identify small molecules that can attach to a specific target protein and change its functionality. Recently, deep learning has shown great promise in generating drug-like molecules with specific biochemical features and conditioned with structural features. However, they usually fail to incorporate an essential factor: the underlying physics which guides molecular formation and binding in real-world scenarios. In this work, we describe a physics-guided deep generative model for new ligand discovery, conditioned not only on the binding site but also on physics-based features that describe the binding mechanism between a receptor and a ligand. The proposed hybrid model has been tested on large protein-ligand complexes and small host-guest systems. Using the top-N methodology, on average more than 75% of the generated structures by our hybrid model were stronger binders than the original reference ligand. All of them had higher ΔGbind (affinity) values than the ones generated by the previous state-of-the-art method by an average margin of 1.88 kcal/mol. The visualization of the top-5 ligands generated by the proposed physics-guided model and the reference deep learning model demonstrate more feasible conformations and orientations by the former. The future directions include training and testing the hybrid model on larger datasets, adding more relevant physics-based features, and interpreting the deep learning outcomes from biophysical perspectives.
RESUMO
Calculation of protein-ligand binding affinity is a cornerstone of drug discovery. Classic implicit solvent models, which have been widely used to accomplish this task, lack accuracy compared to experimental references. Emerging data-driven models, on the other hand, are often accurate yet not fully interpretable and also likely to be overfitted. In this research, we explore the application of Theory-Guided Data Science in studying protein-ligand binding. A hybrid model is introduced by integrating Graph Convolutional Network (data-driven model) with the GBNSR6 implicit solvent (physics-based model). The proposed physics-data model is tested on a dataset of 368 complexes from the PDBbind refined set and 72 host-guest systems. Results demonstrate that the proposed Physics-Guided Neural Network can successfully improve the "accuracy" of the pure data-driven model. In addition, the "interpretability" and "transferability" of our model have boosted compared to the purely data-driven model. Further analyses include evaluating model robustness and understanding relationships between the physical features.
Assuntos
Redes Neurais de Computação , Proteínas , Ligantes , Física , Ligação Proteica , Proteínas/química , Solventes/química , TermodinâmicaRESUMO
The ability to estimate protein-protein binding free energy in a computationally efficient via a physics-based approach is beneficial to research focused on the mechanism of viruses binding to their target proteins. Implicit solvation methodology may be particularly useful in the early stages of such research, as it can offer valuable insights into the binding process, quickly. Here we evaluate the potential of the related molecular mechanics generalized Born surface area (MMGB/SA) approach to estimate the binding free energy ΔGbind between the SARS-CoV-2 spike receptor-binding domain and the human ACE2 receptor. The calculations are based on a recent flavor of the generalized Born model, GBNSR6. Two estimates of ΔGbind are performed: one based on standard bondi radii, and the other based on a newly developed set of atomic radii (OPT1), optimized specifically for protein-ligand binding. We take the average of the resulting two ΔGbind values as the consensus estimate. For the well-studied Ras-Raf protein-protein complex, which has similar binding free energy to that of the SARS-CoV-2/ACE2 complex, the consensus ΔGbind = -11.8 ± 1 kcal/mol, vs. experimental -9.7 ± 0.2 kcal/mol. The consensus estimates for the SARS-CoV-2/ACE2 complex is ΔGbind = -9.4 ± 1.5 kcal/mol, which is in near quantitative agreement with experiment (-10.6 kcal/mol). The availability of a conceptually simple MMGB/SA-based protocol for analysis of the SARS-CoV-2 /ACE2 binding may be beneficial in light of the need to move forward fast.
RESUMO
Accuracy of protein-ligand binding free energy calculations utilizing implicit solvent models is critically affected by parameters of the underlying dielectric boundary, specifically, the atomic and water probe radii. Here, a global multidimensional optimization pipeline is developed to find optimal atomic radii specifically for protein-ligand binding calculations in implicit solvent. The computational pipeline has these three key components: (1) a massively parallel implementation of a deterministic global optimization algorithm (VTDIRECT95), (2) an accurate yet reasonably fast generalized Born implicit solvent model (GBNSR6), and (3) a novel robustness metric that helps distinguish between nearly degenerate local minima via a postprocessing step of the optimization. A graph-based "kT-connectivity" approach to explore and visualize the multidimensional energy landscape is proposed: local minima that can be reached from the global minimum without exceeding a given energy threshold (kT) are considered to be connected. As an illustration of the capabilities of the optimization pipeline, we apply it to find a global optimum in the space of just five radii: four atomic (O, H, N, and C) radii and water probe radius. The optimized radii, ρW = 1.37 Å, ρC = 1.40 Å, ρH = 1.55 Å, ρN = 2.35 Å, and ρO = 1.28 Å, lead to a closer agreement of electrostatic binding free energies with the explicit solvent reference than two commonly used sets of radii previously optimized for small molecules. At the same time, the ability of the optimizer to find the global optimum reveals fundamental limits of the common two-dielectric implicit solvation model: the computed electrostatic binding free energies are still almost 4 kcal/mol away from the explicit solvent reference. The proposed computational approach opens the possibility to further improve the accuracy of practical computational protocols for binding free energy calculations.