ABSTRACT
The structure-based design of antigens holds promise for developing vaccines with higher efficacy and improved safety profiles. We postulate that abrogation of host receptor interaction bears potential for the improvement of vaccines by preventing antigen-induced modification of receptor function as well as the displacement or masking of the immunogen. Antigen modifications may yet destroy epitopes crucial for antibody neutralization. Here, we present a methodology that integrates deep mutational scans to identify and score SARS-CoV-2 receptor binding domain variants that maintain immunogenicity, but lack interaction with the widely expressed host receptor. Single point mutations were scored in silico, validated in vitro, and applied in vivo. Our top-scoring variant receptor binding domain-G502E prevented spike-induced cell-to-cell fusion, receptor internalization, and improved neutralizing antibody responses by 3.3-fold in rabbit immunizations. We name our strategy BIBAX for body-inert, B-cell-activating vaccines, which in the future may be applied beyond SARS-CoV-2 for the improvement of vaccines by design.
Subject(s)
COVID-19 Vaccines , COVID-19 , Animals , Rabbits , Antibodies, Neutralizing , Angiotensin-Converting Enzyme 2/genetics , SARS-CoV-2 , COVID-19/prevention & control , Antibodies, ViralABSTRACT
Bromodomains (BDs) are small protein modules that interact with acetylated marks in histones. These posttranslational modifications are pivotal to regulate gene expression, making BDs promising targets to treat several diseases. While the general structure of BDs is well known, their dynamical features and their interplay with other macromolecules are poorly understood, hampering the rational design of potent and selective inhibitors. Here, we combine extensive molecular dynamics simulations, Markov state modeling, and available structural data to reveal a transiently formed state that is conserved across all BD families. It involves the breaking of two backbone hydrogen bonds that anchor the ZA-loop with the αA helix, opening a cryptic pocket that partially occludes the one associated to histone binding. By analyzing more than 1,900 experimental structures, we unveil just two adopting the hidden state, explaining why it has been previously unnoticed and providing direct structural evidence for its existence. Our results suggest that this state is an allosteric regulatory switch for BDs, potentially related to a recently unveiled BD-DNA-binding mode.
Subject(s)
Cell Cycle Proteins/chemistry , Co-Repressor Proteins/chemistry , DNA-Binding Proteins/chemistry , Histone Acetyltransferases/chemistry , Intracellular Signaling Peptides and Proteins/chemistry , Transcription Factors, General/chemistry , Transcription Factors/chemistry , Tripartite Motif-Containing Protein 28/chemistry , Amino Acid Sequence , Binding Sites , Cell Cycle Proteins/genetics , Cell Cycle Proteins/metabolism , Co-Repressor Proteins/genetics , Co-Repressor Proteins/metabolism , Crystallography, X-Ray , DNA/chemistry , DNA/genetics , DNA/metabolism , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Gene Expression Regulation , Histone Acetyltransferases/genetics , Histone Acetyltransferases/metabolism , Humans , Intracellular Signaling Peptides and Proteins/genetics , Intracellular Signaling Peptides and Proteins/metabolism , Markov Chains , Molecular Dynamics Simulation , Protein Binding , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Protein Interaction Domains and Motifs , Sequence Alignment , Sequence Homology, Amino Acid , Thermodynamics , Transcription Factors/genetics , Transcription Factors/metabolism , Transcription Factors, General/genetics , Transcription Factors, General/metabolism , Tripartite Motif-Containing Protein 28/genetics , Tripartite Motif-Containing Protein 28/metabolismABSTRACT
Molecular dynamics (MD) is the method of choice for understanding the structure, function, and interactions of molecules. However, MD simulations are limited by the strong metastability of many molecules, which traps them in a single conformation basin for an extended amount of time. Enhanced sampling techniques, such as metadynamics and replica exchange, have been developed to overcome this limitation and accelerate the exploration of complex free energy landscapes. In this paper, we propose Vendi Sampling, a replica-based algorithm for increasing the efficiency and efficacy of the exploration of molecular conformation spaces. In Vendi sampling, replicas are simulated in parallel and coupled via a global statistical measure, the Vendi Score, to enhance diversity. Vendi sampling allows for the recovery of unbiased sampling statistics and dramatically improves sampling efficiency. We demonstrate the effectiveness of Vendi sampling in improving molecular dynamics simulations by showing significant improvements in coverage and mixing between metastable states and convergence of free energy estimates for four common benchmarks, including Alanine Dipeptide and Chignolin.
ABSTRACT
Machine learning provides effective computational tools for exploring the chemical space via deep generative models. Here, we propose a new reinforcement learning scheme to fine-tune graph-based deep generative models for de novo molecular design tasks. We show how our computational framework can successfully guide a pretrained generative model toward the generation of molecules with a specific property profile, even when such molecules are not present in the training set and unlikely to be generated by the pretrained model. We explored the following tasks: generating molecules of decreasing/increasing size, increasing drug-likeness, and increasing bioactivity. Using the proposed approach, we achieve a model which generates diverse compounds with predicted DRD2 activity for 95% of sampled molecules, outperforming previously reported methods on this metric.
Subject(s)
Drug Design , Machine LearningABSTRACT
Most current molecular dynamics simulation and analysis methods rely on the idea that the molecular system can be represented by a single global state (e.g., a Markov state in a Markov state model [MSM]). In this approach, molecules can be extensively sampled and analyzed when they only possess a few metastable states, such as small- to medium-sized proteins. However, this approach breaks down in frustrated systems and in large protein assemblies, where the number of global metastable states may grow exponentially with the system size. To address this problem, we here introduce dynamic graphical models (DGMs) that describe molecules as assemblies of coupled subsystems, akin to how spins interact in the Ising model. The change of each subsystem state is only governed by the states of itself and its neighbors. DGMs require fewer parameters than MSMs or other global state models; in particular, we do not need to observe all global system configurations to characterize them. Therefore, DGMs can predict previously unobserved molecular configurations. As a proof of concept, we demonstrate that DGMs can faithfully describe molecular thermodynamics and kinetics and predict previously unobserved metastable states for Ising models and protein simulations.
Subject(s)
Molecular Dynamics Simulation , Proteins/chemistry , Kinetics , Markov Chains , Protein Conformation , ThermodynamicsABSTRACT
The use of coarse-grained (CG) models is a popular approach to study complex biomolecular systems. By reducing the number of degrees of freedom, a CG model can explore long time- and length-scales inaccessible to computational models at higher resolution. If a CG model is designed by formally integrating out some of the system's degrees of freedom, one expects multi-body interactions to emerge in the effective CG model's energy function. In practice, it has been shown that the inclusion of multi-body terms indeed improves the accuracy of a CG model. However, no general approach has been proposed to systematically construct a CG effective energy that includes arbitrary orders of multi-body terms. In this work, we propose a neural network based approach to address this point and construct a CG model as a multi-body expansion. By applying this approach to a small protein, we evaluate the relative importance of the different multi-body terms in the definition of an accurate model. We observe a slow convergence in the multi-body expansion, where up to five-body interactions are needed to reproduce the free energy of an atomistic model.
Subject(s)
Oligopeptides/chemistry , Molecular Dynamics Simulation , Neural Networks, Computer , ThermodynamicsABSTRACT
BACKGROUND: Prevalence for knee osteoarthritis is rising in both Sweden and globally due to increased age and obesity in the population. This has subsequently led to an increasing demand for knee arthroplasties. Correct diagnosis and classification of a knee osteoarthritis (OA) are therefore of a great interest in following-up and planning for either conservative or operative management. Most orthopedic surgeons rely on standard weight bearing radiographs of the knee. Improving the reliability and reproducibility of these interpretations could thus be hugely beneficial. Recently, deep learning which is a form of artificial intelligence (AI), has been showing promising results in interpreting radiographic images. In this study, we aim to evaluate how well an AI can classify the severity of knee OA, using entire image series and not excluding common visual disturbances such as an implant, cast and non-degenerative pathologies. METHODS: We selected 6103 radiographic exams of the knee taken at Danderyd University Hospital between the years 2002-2016 and manually categorized them according to the Kellgren & Lawrence grading scale (KL). We then trained a convolutional neural network (CNN) of ResNet architecture using PyTorch. We evaluated the results against a test set of 300 exams that had been reviewed independently by two senior orthopedic surgeons who settled eventual interobserver disagreements through consensus sessions. RESULTS: The CNN yielded an overall AUC of more than 0.87 for all KL grades except KL grade 2, which yielded an AUC of 0.8 and a mean AUC of 0.92. When merging adjacent KL grades, all but one group showed near perfect results with AUC > 0.95 indicating excellent performance. CONCLUSION: We have found that we could teach a CNN to correctly diagnose and classify the severity of knee OA using the KL grading system without cleaning the input data from major visual disturbances such as implants and other pathologies.
Subject(s)
Deep Learning , Osteoarthritis, Knee , Adult , Artificial Intelligence , Humans , Knee Joint , Osteoarthritis, Knee/diagnostic imaging , Osteoarthritis, Knee/epidemiology , Osteoarthritis, Knee/surgery , Reproducibility of ResultsABSTRACT
Coarse graining enables the investigation of molecular dynamics for larger systems and at longer timescales than is possible at an atomic resolution. However, a coarse graining model must be formulated such that the conclusions we draw from it are consistent with the conclusions we would draw from a model at a finer level of detail. It has been proved that a force matching scheme defines a thermodynamically consistent coarse-grained model for an atomistic system in the variational limit. Wang et al. [ACS Cent. Sci. 5, 755 (2019)] demonstrated that the existence of such a variational limit enables the use of a supervised machine learning framework to generate a coarse-grained force field, which can then be used for simulation in the coarse-grained space. Their framework, however, requires the manual input of molecular features to machine learn the force field. In the present contribution, we build upon the advance of Wang et al. and introduce a hybrid architecture for the machine learning of coarse-grained force fields that learn their own features via a subnetwork that leverages continuous filter convolutions on a graph neural network architecture. We demonstrate that this framework succeeds at reproducing the thermodynamics for small biomolecular systems. Since the learned molecular representations are inherently transferable, the architecture presented here sets the stage for the development of machine-learned, coarse-grained force fields that are transferable across molecular systems.
ABSTRACT
Accurate mechanistic description of structural changes in biomolecules is an increasingly important topic in structural and chemical biology. Markov models have emerged as a powerful way to approximate the molecular kinetics of large biomolecules while keeping full structural resolution in a divide-and-conquer fashion. However, the accuracy of these models is limited by that of the force fields used to generate the underlying molecular dynamics (MD) simulation data. Whereas the quality of classical MD force fields has improved significantly in recent years, remaining errors in the Boltzmann weights are still on the order of a few [Formula: see text], which may lead to significant discrepancies when comparing to experimentally measured rates or state populations. Here we take the view that simulations using a sufficiently good force-field sample conformations that are valid but have inaccurate weights, yet these weights may be made accurate by incorporating experimental data a posteriori. To do so, we propose augmented Markov models (AMMs), an approach that combines concepts from probability theory and information theory to consistently treat systematic force-field error and statistical errors in simulation and experiment. Our results demonstrate that AMMs can reconcile conflicting results for protein mechanisms obtained by different force fields and correct for a wide range of stationary and dynamical observables even when only equilibrium measurements are incorporated into the estimation process. This approach constitutes a unique avenue to combine experiment and computation into integrative models of biomolecular structure and dynamics.
Subject(s)
Markov Chains , Models, Molecular , Molecular Dynamics Simulation , Ubiquitin/metabolism , Protein Folding , Protein Structure, Secondary/physiology , ThermodynamicsABSTRACT
Protein allostery is a phenomenon involving the long range coupling between two distal sites in a protein. In order to elucidate allostery at atomic resoluion on the ligand-binding WW domain of the enzyme Pin1, multistate structures were calculated from exact nuclear Overhauser effect (eNOE). In its free form, the protein undergoes a microsecond exchange between two states, one of which is predisposed to interact with its parent catalytic domain. In presence of the positive allosteric ligand, the equilibrium between the two states is shifted towards domain-domain interaction, suggesting a population shift model. In contrast, the allostery-suppressing ligand decouples the side-chain arrangement at the inter-domain interface thereby reducing the inter-domain interaction. As such, this mechanism is an example of dynamic allostery. The presented distinct modes of action highlight the power of the interplay between dynamics and function in the biological activity of proteins.
Subject(s)
NIMA-Interacting Peptidylprolyl Isomerase/metabolism , Allosteric Regulation , Humans , Models, Molecular , NIMA-Interacting Peptidylprolyl Isomerase/chemistryABSTRACT
Long-lived conformational states and their interconversion rates critically determine protein function and regulation. When these states have distinct chemical shifts, the measurement of relaxation by NMR may provide us with useful information about their structure, kinetics, and thermodynamics at atomic resolution. However, as these experimental data are sensitive to many structural and dynamic effects, their interpretation with phenomenological models is challenging, even if only a few metastable states are involved. Consequently, approximations and simplifications must often be used which increase the risk of missing important microscopic features hidden in the data. Here, we show how molecular dynamics simulations analyzed through Markov state models and the related hidden Markov state models may be used to establish mechanistic models that provide a microscopic interpretation of NMR relaxation data. Using ubiquitin and BPTI as examples, we demonstrate how the approach allows us to dissect experimental data into a number of dynamic processes between metastable states. Such a microscopic view may greatly facilitate the mechanistic interpretation of experimental data and serve as a next-generation method for the validation of molecular mechanics force fields and chemical shift prediction algorithms.
Subject(s)
Molecular Dynamics Simulation , Nuclear Magnetic Resonance, Biomolecular , Proteins/chemistry , Algorithms , Proteins/metabolismABSTRACT
Although often depicted as rigid structures, proteins are highly dynamic systems, whose motions are essential to their functions. Despite this, it is difficult to investigate protein dynamics due to the rapid timescale at which they sample their conformational space, leading most NMR-determined structures to represent only an averaged snapshot of the dynamic picture. While NMR relaxation measurements can help to determine local dynamics, it is difficult to detect translational or concerted motion, and only recently have significant advances been made to make it possible to acquire a more holistic representation of the dynamics and structural landscapes of proteins. Here, we briefly revisit our most recent progress in the theory and use of exact nuclear Overhauser enhancements (eNOEs) for the calculation of structural ensembles that describe their conformational space. New developments are primarily targeted at increasing the number and improving the quality of extracted eNOE distance restraints, such that the multi-state structure calculation can be applied to proteins of higher molecular weights. We then review the implications of the exact NOE to the protein dynamics and function of cyclophilin A and the WW domain of Pin1, and finally discuss our current research and future directions.
Subject(s)
Cyclophilin A/chemistry , NIMA-Interacting Peptidylprolyl Isomerase/chemistry , Nuclear Magnetic Resonance, Biomolecular/methods , Amino Acid Sequence , Humans , Kinetics , Models, Molecular , Molecular Dynamics Simulation , Molecular Structure , Motion , Protein Conformation , Structure-Activity RelationshipABSTRACT
The structure-function paradigm is increasingly replaced by the structure-dynamics-function paradigm. All protein activity is steered by the interplay between enthalpy and entropy. Conformational dynamics serves as a proxy of conformational entropy. Therefore, it is essential to study not only the average conformation but also the spatial sampling of a protein on all timescales. To this purpose, we have established a protocol for determining multiple-state ensembles of proteins based on exact nuclear Overhauser effects (eNOEs). We have recently extended our previously reported eNOE data set for the protein GB3 by a very large set of backbone and side-chain residual dipolar couplings and three-bond J couplings. Here, we demonstrate that at least four structural states are required to represent the complete data set by dissecting the contributions to the CYANA target function, which quantifies restraint violations in structure calculation. We present a four-state ensemble of GB3, which largely preserves the characteristics obtained from eNOEs only. Due to the abundance of the input data, the ensemble and χ(1) angles in particular are well suited for cross-validation of the input data and comparison to x-ray structures. Principal component analysis is used to automatically identify and validate relevant states of the ensembles. Overall, our findings suggest that eNOEs are a valuable alternative to traditional NMR probes in spatial elucidation of proteins.
Subject(s)
Nuclear Magnetic Resonance, Biomolecular/methods , Proteins/chemistry , Models, Molecular , Principal Component Analysis , Protein ConformationABSTRACT
Signal transducer and activator of transcription factors (STATs) are proteins that can translocate into the nucleus, bind DNA, and activate gene transcription. STAT proteins play a crucial role in cell proliferation, apoptosis, and differentiation. The prevalent view is that STAT proteins are able to form dimers and bind DNA only upon phosphorylation of specific tyrosine residues in the transactivation domain. However, this paradigm has been questioned recently by the observation of dimers of unphosphorylated STATs (USTATs) by X-ray, Förster resonance energy transfer, and site-directed mutagenesis. A more complex picture of the dimerization process and of the role of the dimers is, thus, emerging. Here we present an integrated modeling study of STAT3, a member of the STAT family of utmost importance in cancer development and therapy, in which we combine available experimental data with several computational methodologies such as homology modeling, protein-protein docking, and molecular dynamics to build reliable atomistic models of USTAT3 dimers. The models generated with the integrative approach presented here were then validated by performing computational alanine scanning for all the residues in the protein-protein interface. These results confirmed the experimental observation of the importance of some of these residues (in particular Leu78 and Asp19) in the USTAT3 dimerization process. Given the growing importance of USTAT3 dimers in several cellular pathways, our models provide an important tool for studying the effects of pathological mutations at the molecular and/or atomistic level, and in the rational design of new inhibitors of dimerization.
Subject(s)
Models, Molecular , Protein Multimerization , STAT3 Transcription Factor/chemistry , STAT3 Transcription Factor/genetics , Amino Acid Sequence , Animals , Mice , Molecular Sequence Data , Phosphorylation/physiology , Protein Multimerization/physiology , Protein Structure, Secondary , Protein Structure, Tertiary , STAT3 Transcription Factor/metabolismABSTRACT
The study of the spatial sampling of biomolecules is essential to understanding the structure-dynamics-function relationship. We have established a protocol for the determination of multiple-state ensembles based on exact measurements of the nuclear Overhauser effect (eNOE). The protocol is practical since it does not require any additional data, while all other NMR data sets must be supplemented by NOE restraints. The question arises as to how much structural and dynamics information is shared between the eNOEs and other NMR probes. We compile one of the largest and most diverse NMR data sets of a protein to date consisting of eNOEs, RDCs and J couplings for GB3. We show that the eNOEs improve the back-prediction of RDCs and J couplings, either upon use of more than one state, or in comparison to conventional NOEs. Our findings indicate that the eNOE data is self-consistent, consistent with other data, and that the structural representation with multiple states is warranted.
Subject(s)
Nuclear Magnetic Resonance, Biomolecular/methods , Proteins/chemistry , Protein ConformationABSTRACT
Residual dipolar couplings (RDCs) are important probes in structural biology, but their analysis is often complicated by the determination of an alignment tensor or its associated assumptions. We here apply the maximum entropy principle to derive a tensor-free formalism which allows for direct, dynamic analysis of RDCs and holds the classic tensor formalism as a special case. Specifically, the framework enables us to robustly analyze data regardless of whether a clear separation of internal and overall dynamics is possible. Such a separation is often difficult in the core subjects of current structural biology, which include multidomain and intrinsically disordered proteins as well as nucleic acids. We demonstrate the method is tractable and self-consistent and generalizes to data sets comprised of observations from multiple different alignment conditions.
Subject(s)
Escherichia coli Proteins/chemistry , Escherichia coli/chemistry , Membrane Proteins/chemistry , Molecular Dynamics Simulation , Muramidase/chemistry , Peptidylprolyl Isomerase/chemistry , Animals , Chickens , Entropy , Nuclear Magnetic Resonance, Biomolecular , Protein ConformationABSTRACT
The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availability of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against COVID-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.
Subject(s)
COVID-19 , SARS-CoV-2 , Humans , Pandemics , Biological Assay , Drug DiscoveryABSTRACT
We present a new software framework for Markov chain Monte Carlo sampling for simulation, prediction, and inference of protein structure. The software package contains implementations of recent advances in Monte Carlo methodology, such as efficient local updates and sampling from probabilistic models of local protein structure. These models form a probabilistic alternative to the widely used fragment and rotamer libraries. Combined with an easily extendible software architecture, this makes PHAISTOS well suited for Bayesian inference of protein structure from sequence and/or experimental data. Currently, two force-fields are available within the framework: PROFASI and OPLS-AA/L, the latter including the generalized Born surface area solvent model. A flexible command-line and configuration-file interface allows users quickly to set up simulations with the desired configuration. PHAISTOS is released under the GNU General Public License v3.0. Source code and documentation are freely available from http://phaistos.sourceforge.net. The software is implemented in C++ and has been tested on Linux and OSX platforms.
Subject(s)
Markov Chains , Monte Carlo Method , Proteins/chemistry , Software , Bayes Theorem , Computer Simulation , Models, Chemical , Protein ConformationABSTRACT
With recent advances in structural biology, including experimental techniques and deep learning-enabled high-precision structure predictions, molecular dynamics methods that scale up to large biomolecular systems are required. Current state-of-the-art approaches in molecular dynamics modeling focus on encoding global configurations of molecular systems as distinct states. This paradigm commands us to map out all possible structures and sample transitions between them, a task that becomes impossible for large-scale systems such as biomolecular complexes. To arrive at scalable molecular models, we suggest moving away from global state descriptions to a set of coupled models that each describe the dynamics of local domains or sites of the molecular system. We describe limitations in the current state-of-the-art global-state Markovian modeling approaches and then introduce Markov field models as an umbrella term that includes models from various scientific communities, including Independent Markov decomposition, Ising and Potts models, and (dynamic) graphical models, and evaluate their use for computational molecular biology. Finally, we give a few examples of early adoptions of these ideas for modeling molecular kinetics and thermodynamics.
Subject(s)
Molecular Dynamics Simulation , Physics , Markov Chains , Kinetics , ThermodynamicsABSTRACT
We present machine learning models for predicting the chemical context for Buchwald-Hartwig coupling reactions, i. e., what chemicals to add to the reactants to give a productive reaction. Using reaction data from in-house electronic lab notebooks, we train two models: one based on single-label data and one based on multi-label data. Both models show excellent top-3 accuracy of approximately 90 %, which suggests strong predictivity. Furthermore, there seems to be an advantage of including multi-label data because the multi-label model shows higher accuracy and better sensitivity for the individual contexts than the single-label model. Although the models are performant, we also show that such models need to be re-trained periodically as there is a strong temporal characteristic to the usage of different contexts. Therefore, a model trained on historical data will decrease in usefulness with time as newer and better contexts emerge and replace older ones. We hypothesize that such significant transitions in the context-usage will likely affect any model predicting chemical contexts trained on historical data. Consequently, training context prediction models warrants careful planning of what data is used for training and how often the model needs to be re-trained.