Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
1.
J Chem Inf Model ; 64(7): 2496-2507, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-37983381

ABSTRACT

Accurate in silico prediction of protein-ligand binding affinity is important in the early stages of drug discovery. Deep learning-based methods exist but have yet to overtake more conventional methods such as giga-docking largely due to their lack of generalizability. To improve generalizability, we need to understand what these models learn from input protein and ligand data. We systematically investigated a sequence-based deep learning framework to assess the impact of protein and ligand encodings on predicting binding affinities for commonly used kinase data sets. The role of proteins is studied using convolutional neural network-based encodings obtained from sequences and graph neural network-based encodings enriched with structural information from contact maps. Ligand-based encodings are generated from graph-neural networks. We test different ligand perturbations by randomizing node and edge properties. For proteins, we make use of 3 different protein contact generation methods (AlphaFold2, Pconsc4, and ESM-1b) and compare these with a random control. Our investigation shows that protein encodings do not substantially impact the binding predictions, with no statistically significant difference in binding affinity for KIBA in the investigated metrics (concordance index, Pearson's R Spearman's Rank, and RMSE). Significant differences are seen for ligand encodings with random ligands and random ligand node properties, suggesting a much bigger reliance on ligand data for the learning tasks. Using different ways to combine protein and ligand encodings did not show a significant change in performance.


Subject(s)
Deep Learning , Ligands , Proteins/chemistry , Neural Networks, Computer , Protein Binding
2.
J Chem Inf Model ; 64(6): 1955-1965, 2024 Mar 25.
Article in English | MEDLINE | ID: mdl-38446131

ABSTRACT

Active learning (AL) has become a powerful tool in computational drug discovery, enabling the identification of top binders from vast molecular libraries. To design a robust AL protocol, it is important to understand the influence of AL parameters, as well as the features of the data sets on the outcomes. We use four affinity data sets for different targets (TYK2, USP7, D2R, Mpro) to systematically evaluate the performance of machine learning models [Gaussian process (GP) model and Chemprop model], sample selection protocols, and the batch size based on metrics describing the overall predictive power of the model (R2, Spearman rank, root-mean-square error) as well as the accurate identification of top 2%/5% binders (Recall, F1 score). Both models have a comparable Recall of top binders on large data sets, but the GP model surpasses the Chemprop model when training data are sparse. A larger initial batch size, especially on diverse data sets, increased the Recall of both models as well as overall correlation metrics. However, for subsequent cycles, smaller batch sizes of 20 or 30 compounds proved to be desirable. Furthermore, adding artificial Gaussian noise to the data up to a certain threshold still allowed the model to identify clusters with top-scoring compounds. However, excessive noise (<1σ) did impact the model's predictive and exploitative capabilities.


Subject(s)
Benchmarking , Machine Learning , Ligands , Drug Discovery/methods
3.
J Chem Phys ; 160(20)2024 May 28.
Article in English | MEDLINE | ID: mdl-38814008

ABSTRACT

Sire is a Python/C++ library that is used both to prototype new algorithms and as an interoperability engine for exchanging information between molecular simulation programs. It provides a collection of file parsers and information converters that together make it easier to combine and leverage the functionality of many other programs and libraries. This empowers researchers to use sire to write a single script that can, for example, load a molecule from a PDBx/mmCIF file via Gemmi, perform SMARTS searches via RDKit, parameterize molecules using BioSimSpace, run GPU-accelerated molecular dynamics via OpenMM, and then display the resulting dynamics trajectory in a NGLView Jupyter notebook 3D molecular viewer. This functionality is built on by BioSimSpace, which uses sire's molecular information engine to interconvert with programs such as GROMACS, NAMD, Amber, and AmberTools for automated molecular parameterization and the running of molecular dynamics, metadynamics, and alchemical free energy workflows. Sire comes complete with a powerful molecular information search engine, plus trajectory loading and editing, analysis, and energy evaluation engines. This, when combined with an in-built computer algebra system, gives substantial flexibility to researchers to load, search for, edit, and combine molecular information from multiple sources and use that to drive novel algorithms by combining functionality from other programs. Sire is open source (GPL3) and is available via conda and at a free Jupyter notebook server at https://try.openbiosim.org. Sire is supported by the not-for-profit OpenBioSim community interest company.

4.
Phys Biol ; 20(4)2023 05 26.
Article in English | MEDLINE | ID: mdl-37184431

ABSTRACT

The mechanisms by which a protein's 3D structure can be determined based on its amino acid sequence have long been one of the key mysteries of biophysics. Often simplistic models, such as those derived from geometric constraints, capture bulk real-world 3D protein-protein properties well. One approach is using protein contact maps (PCMs) to better understand proteins' properties. In this study, we explore the emergent behaviour of contact maps for different geometrically constrained models and compare them to real-world protein systems. Specifically, we derive an analytical approximation for the distribution of amino acid distances, denoted asP(s), using a mean-field approach based on a geometric constraint model. This approximation is then validated for amino acid distance distributions generated from a 2D and 3D version of the geometrically constrained random interaction model. For real protein data, we show how the analytical approximation can be used to fit amino acid distance distributions of protein chain lengths ofL ≈ 100,L ≈ 200, andL ≈ 300 generated from two different methods of evaluating a PCM, a simple cutoff based method and a shadow map based method. We present evidence that geometric constraints are sufficient to model the amino acid distance distributions of protein chains in bulk and amino acid sequences only play a secondary role, regardless of the definition of the PCM.


Subject(s)
Protein Folding , Proteins , Protein Conformation , Proteins/chemistry , Amino Acids/chemistry , Amino Acid Sequence
5.
J Chem Inf Model ; 63(19): 5996-6005, 2023 Oct 09.
Article in English | MEDLINE | ID: mdl-37724771

ABSTRACT

Computationally generating new synthetically accessible compounds with high affinity and low toxicity is a great challenge in drug design. Machine learning models beyond conventional pharmacophoric methods have shown promise in the generation of novel small-molecule compounds but require significant tuning for a specific protein target. Here, we introduce a method called selective iterative latent variable refinement (SILVR) for conditioning an existing diffusion-based equivariant generative model without retraining. The model allows the generation of new molecules that fit into a binding site of a protein based on fragment hits. We use the SARS-CoV-2 main protease fragments from Diamond XChem that form part of the COVID Moonshot project as a reference dataset for conditioning the molecule generation. The SILVR rate controls the extent of conditioning, and we show that moderate SILVR rates make it possible to generate new molecules of similar shape to the original fragments, meaning that the new molecules fit the binding site without knowledge of the protein. We can also merge up to 3 fragments into a new molecule without affecting the quality of molecules generated by the underlying generative model. Our method is generalizable to any protein target with known fragments and any diffusion-based model for molecule generation.

6.
J Chem Inf Model ; 61(5): 2124-2130, 2021 05 24.
Article in English | MEDLINE | ID: mdl-33886305

ABSTRACT

The quantum mechanical bespoke (QUBE) force-field approach has been developed to facilitate the automated derivation of potential energy function parameters for modeling protein-ligand binding. To date, the approach has been validated in the context of Monte Carlo simulations of protein-ligand complexes. We describe here the implementation of the QUBE force field in the alchemical free-energy calculation molecular dynamics simulation package SOMD. The implementation is validated by demonstrating the reproducibility of absolute hydration free energies computed with the QUBE force field across the SOMD and GROMACS software packages. We further demonstrate, by way of a case study involving two series of non-nucleoside inhibitors of HIV-1 reverse transcriptase, that the availability of QUBE in a modern simulation package that makes efficient use of graphics processing unit acceleration will facilitate high-throughput alchemical free-energy calculations.


Subject(s)
Molecular Dynamics Simulation , Entropy , Ligands , Reproducibility of Results , Thermodynamics
7.
J Chem Inf Model ; 61(6): 3058-3073, 2021 06 28.
Article in English | MEDLINE | ID: mdl-34124899

ABSTRACT

ß-coronavirus (CoVs) alone has been responsible for three major global outbreaks in the 21st century. The current crisis has led to an urgent requirement to develop therapeutics. Even though a number of vaccines are available, alternative strategies targeting essential viral components are required as a backup against the emergence of lethal viral variants. One such target is the main protease (Mpro) that plays an indispensable role in viral replication. The availability of over 270 Mpro X-ray structures in complex with inhibitors provides unique insights into ligand-protein interactions. Herein, we provide a comprehensive comparison of all nonredundant ligand-binding sites available for SARS-CoV2, SARS-CoV, and MERS-CoV Mpro. Extensive adaptive sampling has been used to investigate structural conservation of ligand-binding sites using Markov state models (MSMs) and compare conformational dynamics employing convolutional variational auto-encoder-based deep learning. Our results indicate that not all ligand-binding sites are dynamically conserved despite high sequence and structural conservation across ß-CoV homologs. This highlights the complexity in targeting all three Mpro enzymes with a single pan inhibitor.


Subject(s)
COVID-19 , Peptide Hydrolases , Antiviral Agents , Binding Sites , Humans , Ligands , Protease Inhibitors , RNA, Viral , SARS-CoV-2
8.
J Chem Inf Model ; 60(11): 5331-5339, 2020 11 23.
Article in English | MEDLINE | ID: mdl-32639733

ABSTRACT

A methodology that combines alchemical free energy calculations (FEP) with machine learning (ML) has been developed to compute accurate absolute hydration free energies. The hybrid FEP/ML methodology was trained on a subset of the FreeSolv database and retrospectively shown to outperform most submissions from the SAMPL4 competition. Compared to pure machine-learning approaches, FEP/ML yields more precise estimates of free energies of hydration and requires a fraction of the training set size to outperform standalone FEP calculations. The ML-derived correction terms are further shown to be transferable to a range of related FEP simulation protocols. The approach may be used to inexpensively improve the accuracy of FEP calculations and to flag molecules which will benefit the most from bespoke force field parametrization efforts.


Subject(s)
Machine Learning , Computer Simulation , Entropy , Retrospective Studies , Thermodynamics
9.
J Chem Inf Model ; 60(6): 3120-3130, 2020 06 22.
Article in English | MEDLINE | ID: mdl-32437145

ABSTRACT

Free-energy calculations have seen increased usage in structure-based drug design. Despite the rising interest, automation of the complex calculations and subsequent analysis of their results are still hampered by the restricted choice of available tools. In this work, an application for automated setup and processing of free-energy calculations is presented. Several sanity checks for assessing the reliability of the calculations were implemented, constituting a distinct advantage over existing open-source tools. The underlying workflow is built on top of the software Sire, SOMD, BioSimSpace, and OpenMM and uses the AMBER 14SB and GAFF2.1 force fields. It was validated on two datasets originally composed by Schrödinger, consisting of 14 protein structures and 220 ligands. Predicted binding affinities were in good agreement with experimental values. For the larger dataset, the average correlation coefficient Rp was 0.70 ± 0.05 and average Kendall's τ was 0.53 ± 0.05, which are broadly comparable to or better than previously reported results using other methods.


Subject(s)
Drug Design , Software , Ligands , Protein Binding , Reproducibility of Results , Thermodynamics
10.
J Comput Aided Mol Des ; 32(1): 199-210, 2018 01.
Article in English | MEDLINE | ID: mdl-29134431

ABSTRACT

The Drug Design Data Resource (D3R) consortium organises blinded challenges to address the latest advances in computational methods for ligand pose prediction, affinity ranking, and free energy calculations. Within the context of the second D3R Grand Challenge several blinded binding free energies predictions were made for two congeneric series of Farsenoid X Receptor (FXR) inhibitors with a semi-automated alchemical free energy calculation workflow featuring FESetup and SOMD software tools. Reasonable performance was observed in retrospective analyses of literature datasets. Nevertheless, blinded predictions on the full D3R datasets were poor due to difficulties encountered with the ranking of compounds that vary in their net-charge. Performance increased for predictions that were restricted to subsets of compounds carrying the same net-charge. Disclosure of X-ray crystallography derived binding modes maintained or improved the correlation with experiment in a subsequent rounds of predictions. The best performing protocols on D3R set1 and set2 were comparable or superior to predictions made on the basis of analysis of literature structure activity relationships (SAR)s only, and comparable or slightly inferior, to the best submissions from other groups.


Subject(s)
Computer-Aided Design , Drug Design , Molecular Docking Simulation , Receptors, Cytoplasmic and Nuclear/metabolism , Thermodynamics , Binding Sites , Crystallography, X-Ray , Databases, Protein , Humans , Ligands , Protein Binding , Protein Conformation , Receptors, Cytoplasmic and Nuclear/chemistry
11.
J Comput Aided Mol Des ; 31(1): 61-70, 2017 01.
Article in English | MEDLINE | ID: mdl-27503495

ABSTRACT

In the context of the SAMPL5 blinded challenge standard free energies of binding were predicted for a dataset of 22 small guest molecules and three different host molecules octa-acids (OAH and OAMe) and a cucurbituril (CBC). Three sets of predictions were submitted, each based on different variations of classical molecular dynamics alchemical free energy calculation protocols based on the double annihilation method. The first model (model A) yields a free energy of binding based on computed free energy changes in solvated and host-guest complex phases; the second (model B) adds long range dispersion corrections to the previous result; the third (model C) uses an additional standard state correction term to account for the use of distance restraints during the molecular dynamics simulations. Model C performs the best in terms of mean unsigned error for all guests (MUE [Formula: see text]-95 % confidence interval) for the whole data set and in particular for the octa-acid systems (MUE [Formula: see text]). The overall correlation with experimental data for all models is encouraging ([Formula: see text]). The correlation between experimental and computational free energy of binding ranks as one of the highest with respect to other entries in the challenge. Nonetheless the large MUE for the best performing model highlights systematic errors, and submissions from other groups fared better with respect to this metric.


Subject(s)
Ligands , Molecular Dynamics Simulation , Proteins/chemistry , Thermodynamics , Hydrophobic and Hydrophilic Interactions , Macrocyclic Compounds/chemistry , Molecular Conformation , Molecular Structure , Protein Binding , Solvents/chemistry
12.
J Comput Aided Mol Des ; 30(11): 1101-1114, 2016 11.
Article in English | MEDLINE | ID: mdl-27677751

ABSTRACT

In the context of the SAMPL5 challenge water-cyclohexane distribution coefficients for 53 drug-like molecules were predicted. Four different models based on molecular dynamics free energy calculations were tested. All models initially assumed only one chemical state present in aqueous or organic phases. Model A is based on results from an alchemical annihilation scheme; model B adds a long range correction for the Lennard Jones potentials to model A; model C adds charging free energy corrections; model D applies the charging correction from model C to ionizable species only. Model A and B perform better in terms of mean-unsigned error ([Formula: see text] D units - 95 % confidence interval) and determination coefficient [Formula: see text], while charging corrections lead to poorer results with model D ([Formula: see text] and [Formula: see text]). Because overall errors were large, a retrospective analysis that allowed co-existence of ionisable and neutral species of a molecule in aqueous phase was investigated. This considerably reduced systematic errors ([Formula: see text] and [Formula: see text]). Overall accurate [Formula: see text] predictions for drug-like molecules that may adopt multiple tautomers and charge states proved difficult, indicating a need for methodological advances to enable satisfactory treatment by explicit-solvent molecular simulations.


Subject(s)
Computer Simulation , Pharmaceutical Preparations/chemistry , Solvents/chemistry , Cyclohexanes/chemistry , Databases, Chemical , Models, Chemical , Molecular Structure , Quantum Theory , Solubility , Thermodynamics , Water/chemistry
13.
Bioorg Med Chem ; 24(20): 4890-4899, 2016 10 15.
Article in English | MEDLINE | ID: mdl-27485604

ABSTRACT

In the framework of the 2015 D3R inaugural grand challenge, blind binding pose and affinity predictions were performed for a set of 180 ligands of the Heat Shock Protein HSP90-α protein, a relevant cancer target. Spectral clustering was used to rapidly identify alternative binding site conformations in publicly available crystallographic HSP90-α structures. Subsequently, multiple docking and scoring protocols employing the software Autodock Vina and rDock were applied to predict binding modes and rank order ligands. Alchemical free energy calculations were performed with the software FESetup and Sire/OpenMM to predict binding affinities for three congeneric series subsets. Some of the protocols used here were ranked among the top submissions according to most of the evaluation metrics. Docking performance was excellent, but the scoring results were disappointing. A critical assessment of the results is reported, as well as suggestions for future similar competitions.


Subject(s)
HSP90 Heat-Shock Proteins/chemistry , Thermodynamics , Binding Sites , Databases, Factual , Ligands , Molecular Docking Simulation , Protein Conformation
14.
J Chem Phys ; 141(21): 214106, 2014 Dec 07.
Article in English | MEDLINE | ID: mdl-25481128

ABSTRACT

We propose a discrete transition-based reweighting analysis method (dTRAM) for analyzing configuration-space-discretized simulation trajectories produced at different thermodynamic states (temperatures, Hamiltonians, etc.) dTRAM provides maximum-likelihood estimates of stationary quantities (probabilities, free energies, expectation values) at any thermodynamic state. In contrast to the weighted histogram analysis method (WHAM), dTRAM does not require data to be sampled from global equilibrium, and can thus produce superior estimates for enhanced sampling data such as parallel/simulated tempering, replica exchange, umbrella sampling, or metadynamics. In addition, dTRAM provides optimal estimates of Markov state models (MSMs) from the discretized state-space trajectories at all thermodynamic states. Under suitable conditions, these MSMs can be used to calculate kinetic quantities (e.g., rates, timescales). In the limit of a single thermodynamic state, dTRAM estimates a maximum likelihood reversible MSM, while in the limit of uncorrelated sampling data, dTRAM is identical to WHAM. dTRAM is thus a generalization to both estimators.


Subject(s)
Proteins/chemistry , Thermodynamics , Ion Channels/chemistry , Likelihood Functions , Markov Chains , Molecular Dynamics Simulation
15.
J Chem Theory Comput ; 20(2): 977-988, 2024 Jan 23.
Article in English | MEDLINE | ID: mdl-38163961

ABSTRACT

Markov state models (MSM) are a popular statistical method for analyzing the conformational dynamics of proteins including protein folding. With all statistical and machine learning (ML) models, choices must be made about the modeling pipeline that cannot be directly learned from the data. These choices, or hyperparameters, are often evaluated by expert judgment or, in the case of MSMs, by maximizing variational scores such as the VAMP-2 score. Modern ML and statistical pipelines often use automatic hyperparameter selection techniques ranging from the simple, choosing the best score from a random selection of hyperparameters, to the complex, optimization via, e.g., Bayesian optimization. In this work, we ask whether it is possible to automatically select MSM models this way by estimating and analyzing over 16,000,000 observations from over 280,000 estimated MSMs. We find that differences in hyperparameters can change the physical interpretation of the optimization objective, making automatic selection difficult. In addition, we find that enforcing conditions of equilibrium in the VAMP scores can result in inconsistent model selection. However, other parameters that specify the VAMP-2 score (lag time and number of relaxation processes scored) have only a negligible influence on model selection. We suggest that model observables and variational scores should be only a guide to model selection and that a full investigation of the MSM properties should be undertaken when selecting hyperparameters.


Subject(s)
Proteins , Vesicle-Associated Membrane Protein 2 , Bayes Theorem , Protein Folding , Machine Learning , Markov Chains
16.
bioRxiv ; 2023 Feb 18.
Article in English | MEDLINE | ID: mdl-36824771

ABSTRACT

The cytoplasm is compartmentalized into different translation environments. mRNAs use their 3'UTRs to localize to distinct cytoplasmic compartments, including TIS granules (TGs). Many transcription factors, including MYC, are translated in TGs. It was shown that translation of proteins in TGs enables the formation of protein complexes that cannot be established when these proteins are translated in the cytosol, but the mechanism is poorly understood. Here we show that MYC protein complexes that involve binding to the intrinsically disordered region (IDR) of MYC are only formed when MYC is translated in TGs. TG-dependent protein complexes require TG-enriched mRNAs for assembly. These mRNAs bind to a new and widespread RNA-binding domain in neutral or negatively charged IDRs in several transcription factors, including MYC. RNA-IDR interaction changes the conformational ensemble of the IDR, enabling the formation of MYC protein complexes that act in the nucleus and control functions that cannot be accomplished by cytosolically-translated MYC. We propose that certain mRNAs have IDR chaperone activity as they control IDR conformations. In addition to post-translational modifications, we found a novel mode of protein activity regulation. Since RNA-IDR interactions are prevalent, we suggest that mRNA-dependent control of protein functional states is widespread.

17.
Viruses ; 15(3)2023 03 07.
Article in English | MEDLINE | ID: mdl-36992405

ABSTRACT

The cowpea chlorotic mottle virus (CCMV) is a plant virus explored as a nanotechnological platform. The robust self-assembly mechanism of its capsid protein allows for drug encapsulation and targeted delivery. Additionally, the capsid nanoparticle can be used as a programmable platform to display different molecular moieties. In view of future applications, efficient production and purification of plant viruses are key steps. In established protocols, the need for ultracentrifugation is a significant limitation due to cost, difficult scalability, and safety issues. In addition, the purity of the final virus isolate often remains unclear. Here, an advanced protocol for the purification of the CCMV from infected plant tissue was developed, focusing on efficiency, economy, and final purity. The protocol involves precipitation with PEG 8000, followed by affinity extraction using a novel peptide aptamer. The efficiency of the protocol was validated using size exclusion chromatography, MALDI-TOF mass spectrometry, reversed-phase HPLC, and sandwich immunoassay. Furthermore, it was demonstrated that the final eluate of the affinity column is of exceptional purity (98.4%) determined by HPLC and detection at 220 nm. The scale-up of our proposed method seems to be straightforward, which opens the way to the large-scale production of such nanomaterials. This highly improved protocol may facilitate the use and implementation of plant viruses as nanotechnological platforms for in vitro and in vivo applications.


Subject(s)
Aptamers, Peptide , Bromovirus , Nanoparticles , Aptamers, Peptide/analysis , Aptamers, Peptide/metabolism , Capsid Proteins/metabolism , Capsid/metabolism
18.
Article in English | MEDLINE | ID: mdl-36382113

ABSTRACT

Free energy calculations are rapidly becoming indispensable in structure-enabled drug discovery programs. As new methods, force fields, and implementations are developed, assessing their expected accuracy on real-world systems (benchmarking) becomes critical to provide users with an assessment of the accuracy expected when these methods are applied within their domain of applicability, and developers with a way to assess the expected impact of new methodologies. These assessments require construction of a benchmark-a set of well-prepared, high quality systems with corresponding experimental measurements designed to ensure the resulting calculations provide a realistic assessment of expected performance when these methods are deployed within their domains of applicability. To date, the community has not yet adopted a common standardized benchmark, and existing benchmark reports suffer from a myriad of issues, including poor data quality, limited statistical power, and statistically deficient analyses, all of which can conspire to produce benchmarks that are poorly predictive of real-world performance. Here, we address these issues by presenting guidelines for (1) curating experimental data to develop meaningful benchmark sets, (2) preparing benchmark inputs according to best practices to facilitate widespread adoption, and (3) analysis of the resulting predictions to enable statistically meaningful comparisons among methods and force fields. We highlight challenges and open questions that remain to be solved in these areas, as well as recommendations for the collection of new datasets that might optimally serve to measure progress as methods become systematically more reliable. Finally, we provide a curated, versioned, open, standardized benchmark set adherent to these standards (PLBenchmarks) and an open source toolkit for implementing standardized best practices assessments (arsenic) for the community to use as a standardized assessment tool. While our main focus is free energy methods based on molecular simulations, these guidelines should prove useful for assessment of the rapidly growing field of machine learning methods for affinity prediction as well.

19.
PLoS One ; 15(2): e0229230, 2020.
Article in English | MEDLINE | ID: mdl-32106258

ABSTRACT

The intricate three-dimensional geometries of protein tertiary structures underlie protein function and emerge through a folding process from one-dimensional chains of amino acids. The exact spatial sequence and configuration of amino acids, the biochemical environment and the temporal sequence of distinct interactions yield a complex folding process that cannot yet be easily tracked for all proteins. To gain qualitative insights into the fundamental mechanisms behind the folding dynamics and generic features of the folded structure, we propose a simple model of structure formation that takes into account only fundamental geometric constraints and otherwise assumes randomly paired connections. We find that despite its simplicity, the model results in a network ensemble consistent with key overall features of the ensemble of Protein Residue Networks we obtained from more than 1000 biological protein geometries as available through the Protein Data Base. Specifically, the distribution of the number of interaction neighbors a unit (amino acid) has, the scaling of the structure's spatial extent with chain length, the eigenvalue spectrum and the scaling of the smallest relaxation time with chain length are all consistent between model and real proteins. These results indicate that geometric constraints alone may already account for a number of generic features of protein tertiary structures.


Subject(s)
Amino Acids/chemistry , Protein Conformation , Protein Interaction Domains and Motifs , Proteins/chemistry , Algorithms , Amino Acids/metabolism , Humans , Models, Molecular , Protein Folding , Proteins/metabolism
20.
Chem Sci ; 11(10): 2670-2680, 2020 Jan 15.
Article in English | MEDLINE | ID: mdl-34084326

ABSTRACT

Proteins need to interconvert between many conformations in order to function, many of which are formed transiently, and sparsely populated. Particularly when the lifetimes of these states approach the millisecond timescale, identifying the relevant structures and the mechanism by which they interconvert remains a tremendous challenge. Here we introduce a novel combination of accelerated MD (aMD) simulations and Markov state modelling (MSM) to explore these 'excited' conformational states. Applying this to the highly dynamic protein CypA, a protein involved in immune response and associated with HIV infection, we identify five principally populated conformational states and the atomistic mechanism by which they interconvert. A rational design strategy predicted that the mutant D66A should stabilise the minor conformations and substantially alter the dynamics, whereas the similar mutant H70A should leave the landscape broadly unchanged. These predictions are confirmed using CPMG and R1ρ solution state NMR measurements. By efficiently exploring functionally relevant, but sparsely populated conformations with millisecond lifetimes in silico, our aMD/MSM method has tremendous promise for the design of dynamic protein free energy landscapes for both protein engineering and drug discovery.

SELECTION OF CITATIONS
SEARCH DETAIL