Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 93
Filtrar
1.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37418278

RESUMO

Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in the number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing representative protein conformational ensembles. In this work, we: (1) provide an overview of existing methods and tools for representative protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples from the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.


Assuntos
Simulação de Dinâmica Molecular , Proteínas , Conformação Proteica , Proteínas/química
2.
Chem Rev ; 121(16): 9722-9758, 2021 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-33945269

RESUMO

Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.

3.
Proc Natl Acad Sci U S A ; 117(48): 30610-30618, 2020 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-33184174

RESUMO

Peptide binding to major histocompatibility complexes (MHCs) is a central component of the immune system, and understanding the mechanism behind stable peptide-MHC binding will aid the development of immunotherapies. While MHC binding is mostly influenced by the identity of the so-called anchor positions of the peptide, secondary interactions from nonanchor positions are known to play a role in complex stability. However, current MHC-binding prediction methods lack an analysis of the major conformational states and might underestimate the impact of secondary interactions. In this work, we present an atomically detailed analysis of peptide-MHC binding that can reveal the contributions of any interaction toward stability. We propose a simulation framework that uses both umbrella sampling and adaptive sampling to generate a Markov state model (MSM) for a coronavirus-derived peptide (QFKDNVILL), bound to one of the most prevalent MHC receptors in humans (HLA-A24:02). While our model reaffirms the importance of the anchor positions of the peptide in establishing stable interactions, our model also reveals the underestimated importance of position 4 (p4), a nonanchor position. We confirmed our results by simulating the impact of specific peptide mutations and validated these predictions through competitive binding assays. By comparing the MSM of the wild-type system with those of the D4A and D4P mutations, our modeling reveals stark differences in unbinding pathways. The analysis presented here can be applied to any peptide-MHC complex of interest with a structural model as input, representing an important step toward comprehensive modeling of the MHC class I pathway.


Assuntos
Complexo Principal de Histocompatibilidade , Cadeias de Markov , Modelos Moleculares , Peptídeos/metabolismo , Alanina/genética , Ligação Competitiva , Simulação por Computador , Análise Mutacional de DNA , Mutação/genética , Prolina/metabolismo , Ligação Proteica
4.
J Chem Phys ; 157(18): 181102, 2022 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-36379765

RESUMO

The vibrational spectra of condensed and gas-phase systems are influenced by thequantum-mechanical behavior of light nuclei. Full-dimensional simulations of approximate quantum dynamics are possible thanks to the imaginary time path-integral (PI) formulation of quantum statistical mechanics, albeit at a high computational cost which increases sharply with decreasing temperature. By leveraging advances in machine-learned coarse-graining, we develop a PI method with the reduced computational cost of a classical simulation. We also propose a simple temperature elevation scheme to significantly attenuate the artifacts of standard PI approaches as well as eliminate the unfavorable temperature scaling of the computational cost. We illustrate the approach, by calculating vibrational spectra using standard models of water molecules and bulk water, demonstrating significant computational savings and dramatically improved accuracy compared to more expensive reference approaches. Our simple, efficient, and accurate method has prospects for routine calculations of vibrational spectra for a wide range of molecular systems - with an explicit treatment of the quantum nature of nuclei.

5.
Annu Rev Phys Chem ; 71: 361-390, 2020 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-32092281

RESUMO

Machine learning (ML) is transforming all areas of science. The complex and time-consuming calculations in molecular simulations are particularly suitable for an ML revolution and have already been profoundly affected by the application of existing ML methods. Here we review recent ML methods for molecular simulation, with particular focus on (deep) neural networks for the prediction of quantum-mechanical energies and forces, on coarse-grained molecular dynamics, on the extraction of free energy surfaces and kinetics, and on generative network approaches to sample molecular equilibrium structures and compute thermodynamics. To explain these methods and illustrate open methodological problems, we review some important principles of molecular physics and describe how they can be incorporated into ML structures. Finally, we identify and describe a list of open challenges for the interface between ML and molecular simulation.

6.
J Chem Phys ; 154(16): 160401, 2021 Apr 28.
Artigo em Inglês | MEDLINE | ID: mdl-33940847

RESUMO

Over recent years, the use of statistical learning techniques applied to chemical problems has gained substantial momentum. This is particularly apparent in the realm of physical chemistry, where the balance between empiricism and physics-based theory has traditionally been rather in favor of the latter. In this guest Editorial for the special topic issue on "Machine Learning Meets Chemical Physics," a brief rationale is provided, followed by an overview of the topics covered. We conclude by making some general remarks.

7.
J Chem Phys ; 154(16): 164113, 2021 Apr 28.
Artigo em Inglês | MEDLINE | ID: mdl-33940848

RESUMO

The use of coarse-grained (CG) models is a popular approach to study complex biomolecular systems. By reducing the number of degrees of freedom, a CG model can explore long time- and length-scales inaccessible to computational models at higher resolution. If a CG model is designed by formally integrating out some of the system's degrees of freedom, one expects multi-body interactions to emerge in the effective CG model's energy function. In practice, it has been shown that the inclusion of multi-body terms indeed improves the accuracy of a CG model. However, no general approach has been proposed to systematically construct a CG effective energy that includes arbitrary orders of multi-body terms. In this work, we propose a neural network based approach to address this point and construct a CG model as a multi-body expansion. By applying this approach to a small protein, we evaluate the relative importance of the different multi-body terms in the definition of an accurate model. We observe a slow convergence in the multi-body expansion, where up to five-body interactions are needed to reproduce the free energy of an atomistic model.


Assuntos
Oligopeptídeos/química , Simulação de Dinâmica Molecular , Redes Neurais de Computação , Termodinâmica
8.
J Chem Phys ; 155(8): 084101, 2021 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-34470360

RESUMO

Accurate modeling of the solvent environment for biological molecules is crucial for computational biology and drug design. A popular approach to achieve long simulation time scales for large system sizes is to incorporate the effect of the solvent in a mean-field fashion with implicit solvent models. However, a challenge with existing implicit solvent models is that they often lack accuracy or certain physical properties compared to explicit solvent models as the many-body effects of the neglected solvent molecules are difficult to model as a mean field. Here, we leverage machine learning (ML) and multi-scale coarse graining (CG) in order to learn implicit solvent models that can approximate the energetic and thermodynamic properties of a given explicit solvent model with arbitrary accuracy, given enough training data. Following the previous ML-CG models CGnet and CGSchnet, we introduce ISSNet, a graph neural network, to model the implicit solvent potential of mean force. ISSNet can learn from explicit solvent simulation data and be readily applied to molecular dynamics simulations. We compare the solute conformational distributions under different solvation treatments for two peptide systems. The results indicate that ISSNet models can outperform widely used generalized Born and surface area models in reproducing the thermodynamics of small protein systems with respect to explicit solvent. The success of this novel method demonstrates the potential benefit of applying machine learning methods in accurate modeling of solvent effects for in silico research and biomedical applications.

9.
Proc Natl Acad Sci U S A ; 115(37): 9234-9239, 2018 09 11.
Artigo em Inglês | MEDLINE | ID: mdl-30150375

RESUMO

The presence of conflicting interactions, or frustration, determines how fast biomolecules can explore their configurational landscapes. Recent experiments have provided cases of systems with slow reconfiguration dynamics, perhaps arising from frustration. While it is well known that protein folding speed and mechanism are strongly affected by the protein native structure, it is still unknown how the response to frustration is modulated by the protein topology. We explore the effects of nonnative interactions in the reconfigurational and folding dynamics of proteins with different sizes and topologies. We find that structural correlations related to the folded state size and topology play an important role in determining the folding kinetics of proteins that otherwise have the same amount of nonnative interactions. In particular, we find that the reconfiguration dynamics of α-helical proteins are more susceptible to frustration than ß-sheet proteins of the same size. Our results may explain recent experimental findings and suggest that attempts to measure the degree of frustration due to nonnative interactions might be more successful with α-helical proteins.


Assuntos
Modelos Químicos , Dobramento de Proteína , Proteínas/química , Estrutura Secundária de Proteína
10.
Entropy (Basel) ; 23(2)2021 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-33494443

RESUMO

The reduction of high-dimensional systems to effective models on a smaller set of variables is an essential task in many areas of science. For stochastic dynamics governed by diffusion processes, a general procedure to find effective equations is the conditioning approach. In this paper, we are interested in the spectrum of the generator of the resulting effective dynamics, and how it compares to the spectrum of the full generator. We prove a new relative error bound in terms of the eigenfunction approximation error for reversible systems. We also present numerical examples indicating that, if Kramers-Moyal (KM) type approximations are used to compute the spectrum of the reduced generator, it seems largely insensitive to the time window used for the KM estimators. We analyze the implications of these observations for systems driven by underdamped Langevin dynamics, and show how meaningful effective dynamics can be defined in this setting.

11.
J Chem Phys ; 152(19): 194106, 2020 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-33687259

RESUMO

Gradient-domain machine learning (GDML) is an accurate and efficient approach to learn a molecular potential and associated force field based on the kernel ridge regression algorithm. Here, we demonstrate its application to learn an effective coarse-grained (CG) model from all-atom simulation data in a sample efficient manner. The CG force field is learned by following the thermodynamic consistency principle, here by minimizing the error between the predicted CG force and the all-atom mean force in the CG coordinates. Solving this problem by GDML directly is impossible because coarse-graining requires averaging over many training data points, resulting in impractical memory requirements for storing the kernel matrices. In this work, we propose a data-efficient and memory-saving alternative. Using ensemble learning and stratified sampling, we propose a 2-layer training scheme that enables GDML to learn an effective CG model. We illustrate our method on a simple biomolecular system, alanine dipeptide, by reconstructing the free energy landscape of a CG variant of this molecule. Our novel GDML training scheme yields a smaller free energy error than neural networks when the training set is small, and a comparably high accuracy when the training set is sufficiently large.

12.
J Chem Phys ; 153(19): 194101, 2020 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-33218238

RESUMO

Coarse graining enables the investigation of molecular dynamics for larger systems and at longer timescales than is possible at an atomic resolution. However, a coarse graining model must be formulated such that the conclusions we draw from it are consistent with the conclusions we would draw from a model at a finer level of detail. It has been proved that a force matching scheme defines a thermodynamically consistent coarse-grained model for an atomistic system in the variational limit. Wang et al. [ACS Cent. Sci. 5, 755 (2019)] demonstrated that the existence of such a variational limit enables the use of a supervised machine learning framework to generate a coarse-grained force field, which can then be used for simulation in the coarse-grained space. Their framework, however, requires the manual input of molecular features to machine learn the force field. In the present contribution, we build upon the advance of Wang et al. and introduce a hybrid architecture for the machine learning of coarse-grained force fields that learn their own features via a subnetwork that leverages continuous filter convolutions on a graph neural network architecture. We demonstrate that this framework succeeds at reproducing the thermodynamics for small biomolecular systems. Since the learned molecular representations are inherently transferable, the architecture presented here sets the stage for the development of machine-learned, coarse-grained force fields that are transferable across molecular systems.

13.
Proc Natl Acad Sci U S A ; 114(31): 8265-8270, 2017 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-28716931

RESUMO

Accurate mechanistic description of structural changes in biomolecules is an increasingly important topic in structural and chemical biology. Markov models have emerged as a powerful way to approximate the molecular kinetics of large biomolecules while keeping full structural resolution in a divide-and-conquer fashion. However, the accuracy of these models is limited by that of the force fields used to generate the underlying molecular dynamics (MD) simulation data. Whereas the quality of classical MD force fields has improved significantly in recent years, remaining errors in the Boltzmann weights are still on the order of a few [Formula: see text], which may lead to significant discrepancies when comparing to experimentally measured rates or state populations. Here we take the view that simulations using a sufficiently good force-field sample conformations that are valid but have inaccurate weights, yet these weights may be made accurate by incorporating experimental data a posteriori. To do so, we propose augmented Markov models (AMMs), an approach that combines concepts from probability theory and information theory to consistently treat systematic force-field error and statistical errors in simulation and experiment. Our results demonstrate that AMMs can reconcile conflicting results for protein mechanisms obtained by different force fields and correct for a wide range of stationary and dynamical observables even when only equilibrium measurements are incorporated into the estimation process. This approach constitutes a unique avenue to combine experiment and computation into integrative models of biomolecular structure and dynamics.


Assuntos
Cadeias de Markov , Modelos Moleculares , Simulação de Dinâmica Molecular , Ubiquitina/metabolismo , Dobramento de Proteína , Estrutura Secundária de Proteína/fisiologia , Termodinâmica
14.
J Chem Phys ; 151(4): 044116, 2019 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-31370528

RESUMO

Coarse-graining has become an area of tremendous importance within many different research fields. For molecular simulation, coarse-graining bears the promise of finding simplified models such that long-time simulations of large-scale systems become computationally tractable. While significant progress has been made in tuning thermodynamic properties of reduced models, it remains a key challenge to ensure that relevant kinetic properties are retained by coarse-grained dynamical systems. In this study, we focus on data-driven methods to preserve the rare-event kinetics of the original system and make use of their close connection to the low-lying spectrum of the system's generator. Building on work by Crommelin and Vanden-Eijnden [Multiscale Model. Simul. 9, 1588 (2011)], we present a general framework, called spectral matching, which directly targets the generator's leading eigenvalue equations when learning parameters for coarse-grained models. We discuss different parametric models for effective dynamics and derive the resulting data-based regression problems. We show that spectral matching can be used to learn effective potentials which retain the slow dynamics but also to correct the dynamics induced by existing techniques, such as force matching.

15.
Molecules ; 24(5)2019 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-30832312

RESUMO

The Class I Major Histocompatibility Complex (MHC) is a central protein in immunology as it binds to intracellular peptides and displays them at the cell surface for recognition by T-cells. The structural analysis of bound peptide-MHC complexes (pMHCs) holds the promise of interpretable and general binding prediction (i.e., testing whether a given peptide binds to a given MHC). However, structural analysis is limited in part by the difficulty in modelling pMHCs given the size and flexibility of the peptides that can be presented by MHCs. This article describes APE-Gen (Anchored Peptide-MHC Ensemble Generator), a fast method for generating ensembles of bound pMHC conformations. APE-Gen generates an ensemble of bound conformations by iterated rounds of (i) anchoring the ends of a given peptide near known pockets in the binding site of the MHC, (ii) sampling peptide backbone conformations with loop modelling, and then (iii) performing energy minimization to fix steric clashes, accumulating conformations at each round. APE-Gen takes only minutes on a standard desktop to generate tens of bound conformations, and we show the ability of APE-Gen to sample conformations found in X-ray crystallography even when only sequence information is used as input. APE-Gen has the potential to be useful for its scalability (i.e., modelling thousands of pMHCs or even non-canonical longer peptides) and for its use as a flexible search tool. We demonstrate an example for studying cross-reactivity.


Assuntos
Antígenos de Histocompatibilidade Classe I/química , Complexos Multiproteicos/química , Peptídeos/química , Linfócitos T/química , Sítios de Ligação , Cristalografia por Raios X , Antígenos de Histocompatibilidade Classe I/imunologia , Modelos Moleculares , Complexos Multiproteicos/imunologia , Peptídeos/imunologia , Ligação Proteica , Conformação Proteica , Linfócitos T/imunologia
16.
Biophys J ; 115(8): 1470-1480, 2018 10 16.
Artigo em Inglês | MEDLINE | ID: mdl-30268539

RESUMO

The assembling of the soluble N-ethylmaleimide-sensitive factor attachment protein receptor protein complex is a fundamental step in neuronal exocytosis, and it has been extensively studied in the last two decades. Yet, many details of this process remain inaccessible with the current experimental space and time resolution. Here, we study the zipping mechanism of the soluble N-ethylmaleimide-sensitive factor attachment protein receptor complex computationally by using a coarse-grained model. We explore the different pathways available and analyze their dependence on the computational model employed. We reveal and characterize multiple intermediate states, in agreement with previous experimental findings. We use our model to analyze the influence of single-residue mutations on the thermodynamics of the folding process.


Assuntos
Membrana Celular/metabolismo , Simulação por Computador , Exocitose , Neurônios/metabolismo , Dobramento de Proteína , Proteínas SNARE/química , Metabolismo Energético , Humanos , Mutação , Proteínas SNARE/genética , Proteínas SNARE/metabolismo , Termodinâmica
17.
Chem Rev ; 121(16): 9719-9721, 2021 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-34428897
18.
J Chem Phys ; 148(24): 241723, 2018 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-29960307

RESUMO

With the rapid increase of available data for complex systems, there is great interest in the extraction of physically relevant information from massive datasets. Recently, a framework called Sparse Identification of Nonlinear Dynamics (SINDy) has been introduced to identify the governing equations of dynamical systems from simulation data. In this study, we extend SINDy to stochastic dynamical systems which are frequently used to model biophysical processes. We prove the asymptotic correctness of stochastic SINDy in the infinite data limit, both in the original and projected variables. We discuss algorithms to solve the sparse regression problem arising from the practical implementation of SINDy and show that cross validation is an essential tool to determine the right level of sparsity. We demonstrate the proposed methodology on two test systems, namely, the diffusion in a one-dimensional potential and the projected dynamics of a two-dimensional diffusion process.

19.
J Chem Phys ; 149(24): 244119, 2018 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-30599712

RESUMO

Adaptive sampling methods, often used in combination with Markov state models, are becoming increasingly popular for speeding up rare events in simulation such as molecular dynamics (MD) without biasing the system dynamics. Several adaptive sampling strategies have been proposed, but it is not clear which methods perform better for different physical systems. In this work, we present a systematic evaluation of selected adaptive sampling strategies on a wide selection of fast folding proteins. The adaptive sampling strategies were emulated using models constructed on already existing MD trajectories. We provide theoretical limits for the sampling speed-up and compare the performance of different strategies with and without using some a priori knowledge of the system. The results show that for different goals, different adaptive sampling strategies are optimal. In order to sample slow dynamical processes such as protein folding without a priori knowledge of the system, a strategy based on the identification of a set of metastable regions is consistently the most efficient, while a strategy based on the identification of microstates performs better if the goal is to explore newer regions of the conformational space. Interestingly, the maximum speed-up achievable for the adaptive sampling of slow processes increases for proteins with longer folding times, encouraging the application of these methods for the characterization of slower processes, beyond the fast-folding proteins considered here.


Assuntos
Simulação de Dinâmica Molecular , Proteínas/química , Conformação Proteica , Dobramento de Proteína
20.
J Chem Phys ; 149(18): 180901, 2018 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-30441927

RESUMO

The field of computational molecular sciences (CMSs) has made innumerable contributions to the understanding of the molecular phenomena that underlie and control chemical processes, which is manifested in a large number of community software projects and codes. The CMS community is now poised to take the next transformative steps of better training in modern software design and engineering methods and tools, increasing interoperability through more systematic adoption of agreed upon standards and accepted best-practices, overcoming unnecessary redundancy in software effort along with greater reproducibility, and increasing the deployment of new software onto hardware platforms from in-house clusters to mid-range computing systems through to modern supercomputers. This in turn will have future impact on the software that will be created to address grand challenge science that we illustrate here: the formulation of diverse catalysts, descriptions of long-range charge and excitation transfer, and development of structural ensembles for intrinsically disordered proteins.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA