RESUMEN
Multiplexed cellular imaging typically relies on the sequential application of detection probes, as antibodies or DNA barcodes, which is complex and time-consuming. To address this, we developed here protein nanobarcodes, composed of combinations of epitopes recognized by specific sets of nanobodies. The nanobarcodes are read in a single imaging step, relying on nanobodies conjugated to distinct fluorophores, which enables a precise analysis of large numbers of protein combinations. Fluorescence images from nanobarcodes were used as input images for a deep neural network, which was able to identify proteins with high precision. We thus present an efficient and straightforward protein identification method, which is applicable to relatively complex biological assays. We demonstrate this by a multicell competition assay, in which we successfully used our nanobarcoded proteins together with neurexin and neuroligin isoforms, thereby testing the preferred binding combinations of multiple isoforms, in parallel.
Asunto(s)
Anticuerpos de Dominio Único , ADN , Anticuerpos , Imagen Óptica , Isoformas de ProteínasRESUMEN
Cryo-soft X-ray tomography (cryo-SXT) is a powerful method to investigate the ultrastructure of cells, offering resolution in the tens of nanometer range and strong contrast for membranous structures without requiring labeling or chemical fixation. The short acquisition time and the relatively large field of view leads to fast acquisition of large amounts of tomographic image data. Segmentation of these data into accessible features is a necessary step in gaining biologically relevant information from cryo-soft X-ray tomograms. However, manual image segmentation still requires several orders of magnitude more time than data acquisition. To address this challenge, we have here developed an end-to-end automated 3D segmentation pipeline based on semisupervised deep learning. Our approach is suitable for high-throughput analysis of large amounts of tomographic data, while being robust when faced with limited manual annotations and variations in the tomographic conditions. We validate our approach by extracting three-dimensional information on cellular ultrastructure and by quantifying nanoscopic morphological parameters of filopodia in mammalian cells.
Asunto(s)
Aprendizaje Profundo , Animales , Rayos X , Tomografía por Rayos X/métodos , Microscopía Fluorescente/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Microscopía por Crioelectrón , MamíferosRESUMEN
Presentation of antigenic peptides by major histocompatibility complex class II (MHC-II) proteins determines T helper cell reactivity. The MHC-II genetic locus displays a large degree of allelic polymorphism influencing the peptide repertoire presented by the resulting MHC-II protein allotypes. During antigen processing, the human leukocyte antigen (HLA) molecule HLA-DM (DM) encounters these distinct allotypes and catalyzes exchange of the placeholder peptide CLIP by exploiting dynamic features of MHC-II. Here, we investigate 12 highly abundant CLIP-bound HLA-DRB1 allotypes and correlate dynamics to catalysis by DM. Despite large differences in thermodynamic stability, peptide exchange rates fall into a target range that maintains DM responsiveness. A DM-susceptible conformation is conserved in MHC-II molecules, and allosteric coupling between polymorphic sites affects dynamic states that influence DM catalysis. As exemplified for rheumatoid arthritis, we postulate that intrinsic dynamic features of peptide-MHC-II complexes contribute to the association of individual MHC-II allotypes with autoimmune disease.
Asunto(s)
Antígenos HLA-D , Antígenos HLA-DR , Humanos , Antígenos HLA-D/metabolismo , Antígenos HLA-DR/metabolismo , Péptidos/química , Presentación de Antígeno , Catálisis , Unión ProteicaRESUMEN
Structure prediction of protein complexes has improved significantly with AlphaFold2 and AlphaFold-multimer (AFM), but only 60% of dimers are accurately predicted. Here, we learn a bias to the MSA representation that improves the predictions by performing gradient descent through the AFM network. We demonstrate the performance on seven difficult targets from CASP15 and increase the average MMscore to 0.76 compared to 0.63 with AFM. We evaluate the procedure on 487 protein complexes where AFM fails and obtain an increased success rate (MMscore>0.75) of 33% on these difficult targets. Our protocol, AFProfile, provides a way to direct predictions towards a defined target function guided by the MSA. We expect gradient descent over the MSA to be useful for different tasks.
Asunto(s)
Biología Computacional , Proteínas , Biología Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Modelos Moleculares , Algoritmos , Pliegue de Proteína , Conformación Proteica , Multimerización de Proteína , Complejos Multiproteicos/química , Complejos Multiproteicos/metabolismoRESUMEN
Balanced fusion and fission are key for the proper function and physiology of mitochondria1,2. Remodelling of the mitochondrial inner membrane is mediated by the dynamin-like protein mitochondrial genome maintenance 1 (Mgm1) in fungi or the related protein optic atrophy 1 (OPA1) in animals3-5. Mgm1 is required for the preservation of mitochondrial DNA in yeast6, whereas mutations in the OPA1 gene in humans are a common cause of autosomal dominant optic atrophy-a genetic disorder that affects the optic nerve7,8. Mgm1 and OPA1 are present in mitochondria as a membrane-integral long form and a short form that is soluble in the intermembrane space. Yeast strains that express temperature-sensitive mutants of Mgm19,10 or mammalian cells that lack OPA1 display fragmented mitochondria11,12, which suggests that Mgm1 and OPA1 have an important role in inner-membrane fusion. Consistently, only the mitochondrial outer membrane-not the inner membrane-fuses in the absence of functional Mgm113. Mgm1 and OPA1 have also been shown to maintain proper cristae architecture10,14; for example, OPA1 prevents the release of pro-apoptotic factors by tightening crista junctions15. Finally, the short form of OPA1 localizes to mitochondrial constriction sites, where it presumably promotes mitochondrial fission16. How Mgm1 and OPA1 perform their diverse functions in membrane fusion, scission and cristae organization is at present unknown. Here we present crystal and electron cryo-tomography structures of Mgm1 from Chaetomium thermophilum. Mgm1 consists of a GTPase (G) domain, a bundle signalling element domain, a stalk, and a paddle domain that contains a membrane-binding site. Biochemical and cell-based experiments demonstrate that the Mgm1 stalk mediates the assembly of bent tetramers into helical filaments. Electron cryo-tomography studies of Mgm1-decorated lipid tubes and fluorescence microscopy experiments on reconstituted membrane tubes indicate how the tetramers assemble on positively or negatively curved membranes. Our findings convey how Mgm1 and OPA1 filaments dynamically remodel the mitochondrial inner membrane.
Asunto(s)
Chaetomium/química , Microscopía por Crioelectrón , Proteínas Fúngicas/química , Proteínas Fúngicas/metabolismo , Proteínas de Unión al GTP/química , Membranas Mitocondriales/metabolismo , Proteínas Mitocondriales/química , Cristalografía por Rayos X , Proteínas Fúngicas/ultraestructura , Proteínas de Unión al GTP/metabolismo , Proteínas de Unión al GTP/ultraestructura , Galactosilceramidas/metabolismo , Proteínas Mitocondriales/metabolismo , Proteínas Mitocondriales/ultraestructura , Modelos Moleculares , Dominios Proteicos , Multimerización de ProteínaRESUMEN
In this work, we introduce a flow based machine learning approach called reaction coordinate (RC) flow for the discovery of low-dimensional kinetic models of molecular systems. The RC flow utilizes a normalizing flow to design the coordinate transformation and a Brownian dynamics model to approximate the kinetics of RC, where all model parameters can be estimated in a data-driven manner. In contrast to existing model reduction methods for molecular kinetics, RC flow offers a trainable and tractable model of reduced kinetics in continuous time and space due to the invertibility of the normalizing flow. Furthermore, the Brownian dynamics-based reduced kinetic model investigated in this work yields a readily discernible representation of metastable states within the phase space of the molecular system. Numerical experiments demonstrate how effectively the proposed method discovers interpretable and accurate low-dimensional representations of given full-state kinetics from simulations.
RESUMEN
Bromodomains (BDs) are small protein modules that interact with acetylated marks in histones. These posttranslational modifications are pivotal to regulate gene expression, making BDs promising targets to treat several diseases. While the general structure of BDs is well known, their dynamical features and their interplay with other macromolecules are poorly understood, hampering the rational design of potent and selective inhibitors. Here, we combine extensive molecular dynamics simulations, Markov state modeling, and available structural data to reveal a transiently formed state that is conserved across all BD families. It involves the breaking of two backbone hydrogen bonds that anchor the ZA-loop with the αA helix, opening a cryptic pocket that partially occludes the one associated to histone binding. By analyzing more than 1,900 experimental structures, we unveil just two adopting the hidden state, explaining why it has been previously unnoticed and providing direct structural evidence for its existence. Our results suggest that this state is an allosteric regulatory switch for BDs, potentially related to a recently unveiled BD-DNA-binding mode.
Asunto(s)
Proteínas de Ciclo Celular/química , Proteínas Co-Represoras/química , Proteínas de Unión al ADN/química , Histona Acetiltransferasas/química , Péptidos y Proteínas de Señalización Intracelular/química , Factores Generales de Transcripción/química , Factores de Transcripción/química , Proteína 28 que Contiene Motivos Tripartito/química , Secuencia de Aminoácidos , Sitios de Unión , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Proteínas Co-Represoras/genética , Proteínas Co-Represoras/metabolismo , Cristalografía por Rayos X , ADN/química , ADN/genética , ADN/metabolismo , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Regulación de la Expresión Génica , Histona Acetiltransferasas/genética , Histona Acetiltransferasas/metabolismo , Humanos , Péptidos y Proteínas de Señalización Intracelular/genética , Péptidos y Proteínas de Señalización Intracelular/metabolismo , Cadenas de Markov , Simulación de Dinámica Molecular , Unión Proteica , Conformación Proteica en Hélice alfa , Conformación Proteica en Lámina beta , Dominios y Motivos de Interacción de Proteínas , Alineación de Secuencia , Homología de Secuencia de Aminoácido , Termodinámica , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Factores Generales de Transcripción/genética , Factores Generales de Transcripción/metabolismo , Proteína 28 que Contiene Motivos Tripartito/genética , Proteína 28 que Contiene Motivos Tripartito/metabolismoRESUMEN
To advance the mission of in silico cell biology, modeling the interactions of large and complex biological systems becomes increasingly relevant. The combination of molecular dynamics (MD) simulations and Markov state models (MSMs) has enabled the construction of simplified models of molecular kinetics on long timescales. Despite its success, this approach is inherently limited by the size of the molecular system. With increasing size of macromolecular complexes, the number of independent or weakly coupled subsystems increases, and the number of global system states increases exponentially, making the sampling of all distinct global states unfeasible. In this work, we present a technique called independent Markov decomposition (IMD) that leverages weak coupling between subsystems to compute a global kinetic model without requiring the sampling of all combinatorial states of subsystems. We give a theoretical basis for IMD and propose an approach for finding and validating such a decomposition. Using empirical few-state MSMs of ion channel models that are well established in electrophysiology, we demonstrate that IMD models can reproduce experimental conductance measurements with a major reduction in sampling compared with a standard MSM approach. We further show how to find the optimal partition of all-atom protein simulations into weakly coupled subunits.
Asunto(s)
Cadenas de Markov , Proteínas/metabolismo , Simulación por Computador , Cinética , Modelos Moleculares , Simulación de Dinámica Molecular , Conformación Proteica , Proteínas/químicaRESUMEN
The heat capacity of a material is a fundamental property of great practical importance. For example, in a carbon capture process, the heat required to regenerate a solid sorbent is directly related to the heat capacity of the material. However, for most materials suitable for carbon capture applications, the heat capacity is not known, and thus the standard procedure is to assume the same value for all materials. In this work, we developed a machine learning approach, trained on density functional theory simulations, to accurately predict the heat capacity of these materials, that is, zeolites, metal-organic frameworks and covalent-organic frameworks. The accuracy of our prediction is confirmed with experimental data. Finally, for a temperature swing adsorption process that captures carbon from the flue gas of a coal-fired power plant, we show that for some materials, the heat requirement is reduced by as much as a factor of two using the correct heat capacity.
Asunto(s)
Estructuras Metalorgánicas , Nanoporos , Carbón Mineral , Calor , Centrales Eléctricas , CarbonoRESUMEN
Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
RESUMEN
Early-stage drug discovery projects often focus on equilibrium binding affinity to the target alongside selectivity and other pharmaceutical properties. The kinetics of drug binding are ignored but can have significant influence on drug efficacy. Therefore, increasing attention has been paid on evaluating drug-binding kinetics early in a drug discovery process. Simulating drug-binding kinetics at the atomic level is challenging for the long time scale involved. Here, we used the transition-based reweighting analysis method (TRAM) with the Markov state model to study the dissociation of a ligand from the protein kinase PYK2. TRAM combines biased and unbiased simulations to reduce computational costs. This work used the umbrella sampling technique for the biased simulations. Although using the potential of mean force from umbrella sampling simulations with the transition-state theory over-estimated the dissociation rate by three orders of magnitude, TRAM gave a dissociation rate within an order of magnitude of the experimental value.
Asunto(s)
Quinasa 2 de Adhesión Focal , Proteínas Quinasas , Cinética , Ligandos , Simulación de Dinámica Molecular , Preparaciones Farmacéuticas , Unión ProteicaRESUMEN
The vibrational spectra of condensed and gas-phase systems are influenced by thequantum-mechanical behavior of light nuclei. Full-dimensional simulations of approximate quantum dynamics are possible thanks to the imaginary time path-integral (PI) formulation of quantum statistical mechanics, albeit at a high computational cost which increases sharply with decreasing temperature. By leveraging advances in machine-learned coarse-graining, we develop a PI method with the reduced computational cost of a classical simulation. We also propose a simple temperature elevation scheme to significantly attenuate the artifacts of standard PI approaches as well as eliminate the unfavorable temperature scaling of the computational cost. We illustrate the approach, by calculating vibrational spectra using standard models of water molecules and bulk water, demonstrating significant computational savings and dramatically improved accuracy compared to more expensive reference approaches. Our simple, efficient, and accurate method has prospects for routine calculations of vibrational spectra for a wide range of molecular systems - with an explicit treatment of the quantum nature of nuclei.
RESUMEN
Most current molecular dynamics simulation and analysis methods rely on the idea that the molecular system can be represented by a single global state (e.g., a Markov state in a Markov state model [MSM]). In this approach, molecules can be extensively sampled and analyzed when they only possess a few metastable states, such as small- to medium-sized proteins. However, this approach breaks down in frustrated systems and in large protein assemblies, where the number of global metastable states may grow exponentially with the system size. To address this problem, we here introduce dynamic graphical models (DGMs) that describe molecules as assemblies of coupled subsystems, akin to how spins interact in the Ising model. The change of each subsystem state is only governed by the states of itself and its neighbors. DGMs require fewer parameters than MSMs or other global state models; in particular, we do not need to observe all global system configurations to characterize them. Therefore, DGMs can predict previously unobserved molecular configurations. As a proof of concept, we demonstrate that DGMs can faithfully describe molecular thermodynamics and kinetics and predict previously unobserved metastable states for Ising models and protein simulations.
Asunto(s)
Simulación de Dinámica Molecular , Proteínas/química , Cinética , Cadenas de Markov , Conformación Proteica , TermodinámicaRESUMEN
SUMMARY: Optimizing small molecules in a drug discovery project is a notoriously difficult task as multiple molecular properties have to be considered and balanced at the same time. In this work, we present our novel interactive in silico compound optimization platform termed grünifai to support the ideation of the next generation of compounds under the constraints of a multiparameter objective. grünifai integrates adjustable in silico models, a continuous representation of the chemical space, a scalable particle swarm optimization algorithm and the possibility to actively steer the compound optimization through providing feedback on generated intermediate structures. AVAILABILITY AND IMPLEMENTATION: Source code and documentation are freely available under an MIT license and are openly available on GitHub (https://github.com/jrwnter/gruenifai). The backend, including the optimization method and distribution on multiple GPU nodes is written in Python 3. The frontend is written in ReactJS.
Asunto(s)
Algoritmos , Programas Informáticos , Simulación por Computador , Documentación , Proyectos de InvestigaciónRESUMEN
Machine learning (ML) is transforming all areas of science. The complex and time-consuming calculations in molecular simulations are particularly suitable for an ML revolution and have already been profoundly affected by the application of existing ML methods. Here we review recent ML methods for molecular simulation, with particular focus on (deep) neural networks for the prediction of quantum-mechanical energies and forces, on coarse-grained molecular dynamics, on the extraction of free energy surfaces and kinetics, and on generative network approaches to sample molecular equilibrium structures and compute thermodynamics. To explain these methods and illustrate open methodological problems, we review some important principles of molecular physics and describe how they can be incorporated into ML structures. Finally, we identify and describe a list of open challenges for the interface between ML and molecular simulation.
RESUMEN
The mechanochemical protein dynamin is the prototype of the dynamin superfamily of large GTPases, which shape and remodel membranes in diverse cellular processes. Dynamin forms predominantly tetramers in the cytosol, which oligomerize at the neck of clathrin-coated vesicles to mediate constriction and subsequent scission of the membrane. Previous studies have described the architecture of dynamin dimers, but the molecular determinants for dynamin assembly and its regulation have remained unclear. Here we present the crystal structure of the human dynamin tetramer in the nucleotide-free state. Combining structural data with mutational studies, oligomerization measurements and Markov state models of molecular dynamics simulations, we suggest a mechanism by which oligomerization of dynamin is linked to the release of intramolecular autoinhibitory interactions. We elucidate how mutations that interfere with tetramer formation and autoinhibition can lead to the congenital muscle disorders Charcot-Marie-Tooth neuropathy and centronuclear myopathy, respectively. Notably, the bent shape of the tetramer explains how dynamin assembles into a right-handed helical oligomer of defined diameter, which has direct implications for its function in membrane constriction.
Asunto(s)
Dinaminas/antagonistas & inhibidores , Dinaminas/química , Multimerización de Proteína , Enfermedad de Charcot-Marie-Tooth , Cristalografía por Rayos X , Dinaminas/genética , Dinaminas/metabolismo , Humanos , Cadenas de Markov , Modelos Moleculares , Simulación de Dinámica Molecular , Proteínas Mutantes/antagonistas & inhibidores , Proteínas Mutantes/química , Proteínas Mutantes/genética , Proteínas Mutantes/metabolismo , Mutación/genética , Miopatías Estructurales Congénitas , Nucleótidos , Multimerización de Proteína/genética , Relación Estructura-ActividadRESUMEN
Recent advances in deep learning frameworks have established valuable tools for analyzing the long-timescale behavior of complex systems, such as proteins. In particular, the inclusion of physical constraints, e.g., time-reversibility, was a crucial step to make the methods applicable to biophysical systems. Furthermore, we advance the method by incorporating experimental observables into the model estimation showing that biases in simulation data can be compensated for. We further develop a new neural network layer in order to build a hierarchical model allowing for different levels of details to be studied. Finally, we propose an attention mechanism, which highlights important residues for the classification into different states. We demonstrate the new methodology on an ultralong molecular dynamics simulation of the Villin headpiece miniprotein.
Asunto(s)
Cadenas de Markov , Proteínas de Microfilamentos/química , Simulación de Dinámica Molecular , Redes Neurales de la Computación , BiofisicaRESUMEN
The great challenge with biological membrane systems is the wide range of scales involved, from nanometers and picoseconds for individual lipids to the micrometers and beyond millisecond for cellular signaling processes. While solvent-free coarse-grained membrane models are convenient for large-scale simulations and promising to provide insight into slow processes involving membranes, these models usually have unrealistic kinetics. One major obstacle is the lack of an equally convenient way of introducing hydrodynamic coupling without significantly increasing the computational cost of the model. To address this, we introduce a framework based on anisotropic Langevin dynamics, for which major in-plane and out-of-plane hydrodynamic effects are modeled via friction and diffusion tensors from analytical or semi-analytical solutions to Stokes hydrodynamic equations. Using this framework, in conjunction with our recently developed membrane model, we obtain accurate dispersion relations for planar membrane patches, both free-standing and in the vicinity of a wall. We briefly discuss how non-equilibrium dynamics is affected by hydrodynamic interactions. We also measure the surface viscosity of the model membrane and discuss the affecting dissipative mechanisms.
Asunto(s)
Membrana Celular , Hidrodinámica , Membrana Celular/química , Difusión , Solventes , ViscosidadRESUMEN
Markov chain Monte Carlo methods are a powerful tool for sampling equilibrium configurations in complex systems. One problem these methods often face is slow convergence over large energy barriers. In this work, we propose a novel method that increases convergence in systems composed of many metastable states. This method aims to connect metastable regions directly using generative neural networks in order to propose new configurations in the Markov chain and optimizes the acceptance probability of large jumps between modes in the configuration space. We provide a comprehensive theory as well as a training scheme for the network and demonstrate the method on example systems.
RESUMEN
The use of coarse-grained (CG) models is a popular approach to study complex biomolecular systems. By reducing the number of degrees of freedom, a CG model can explore long time- and length-scales inaccessible to computational models at higher resolution. If a CG model is designed by formally integrating out some of the system's degrees of freedom, one expects multi-body interactions to emerge in the effective CG model's energy function. In practice, it has been shown that the inclusion of multi-body terms indeed improves the accuracy of a CG model. However, no general approach has been proposed to systematically construct a CG effective energy that includes arbitrary orders of multi-body terms. In this work, we propose a neural network based approach to address this point and construct a CG model as a multi-body expansion. By applying this approach to a small protein, we evaluate the relative importance of the different multi-body terms in the definition of an accurate model. We observe a slow convergence in the multi-body expansion, where up to five-body interactions are needed to reproduce the free energy of an atomistic model.