RESUMEN
Variational autoencoders are unsupervised learning models with generative capabilities, when applied to protein data, they classify sequences by phylogeny and generate de novo sequences which preserve statistical properties of protein composition. While previous studies focus on clustering and generative features, here, we evaluate the underlying latent manifold in which sequence information is embedded. To investigate properties of the latent manifold, we utilize direct coupling analysis and a Potts Hamiltonian model to construct a latent generative landscape. We showcase how this landscape captures phylogenetic groupings, functional and fitness properties of several systems including Globins, ß-lactamases, ion channels, and transcription factors. We provide support on how the landscape helps us understand the effects of sequence variability observed in experimental data and provides insights on directed and natural protein evolution. We propose that combining generative properties and functional predictive power of variational autoencoders and coevolutionary analysis could be beneficial in applications for protein engineering and design.
Asunto(s)
Globinas , Factores de Transcripción , Filogenia , Secuencia de Aminoácidos , beta-Lactamasas/genéticaRESUMEN
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Asunto(s)
COVID-19 , SARS-CoV-2 , Animales , Genoma Viral , Humanos , VertebradosRESUMEN
Two-component systems (TCS) are signaling machinery that consist of a histidine kinases (HK) and response regulator (RR). When an environmental change is detected, the HK phosphorylates its cognate response regulator (RR). While cognate interactions were considered orthogonal, experimental evidence shows the prevalence of crosstalk interactions between non-cognate HK-RR pairs. Currently, crosstalk interactions have been demonstrated for TCS proteins in a limited number of organisms. By providing specificity predictions across entire TCS networks for a large variety of organisms, the ELIHKSIR web server assists users in identifying interactions for TCS proteins and their mutants. To generate specificity scores, a global probabilistic model was used to identify interfacial couplings and local fields from sequence information. These couplings and local fields were then used to construct Hamiltonian scores for positions with encoded specificity, resulting in the specificity score. These methods were applied to 6676 organisms available on the ELIHKSIR web server. Due to the ability to mutate proteins and display the resulting network changes, there are nearly endless combinations of TCS networks to analyze using ELIHKSIR. The functionality of ELIHKSIR allows users to perform a variety of TCS network analyses and visualizations to support TCS research efforts.
RESUMEN
During embryogenesis, morphogens form a concentration gradient in responsive tissue, which is then translated into a spatial cellular pattern. The mechanisms by which morphogens spread through a tissue to establish such a morphogenetic field remain elusive. Here, we investigate by mutually complementary simulations and in vivo experiments how Wnt morphogen transport by cytonemes differs from typically assumed diffusion-based transport for patterning of highly dynamic tissue such as the neural plate in zebrafish. Stochasticity strongly influences fate acquisition at the single cell level and results in fluctuating boundaries between pattern regions. Stable patterning can be achieved by sorting through concentration dependent cell migration and apoptosis, independent of the morphogen transport mechanism. We show that Wnt transport by cytonemes achieves distinct Wnt thresholds for the brain primordia earlier compared with diffusion-based transport. We conclude that a cytoneme-mediated morphogen transport together with directed cell sorting is a potentially favored mechanism to establish morphogen gradients in rapidly expanding developmental systems.
Asunto(s)
Tipificación del Cuerpo/fisiología , Regulación del Desarrollo de la Expresión Génica , Vertebrados/embriología , Proteínas Wnt/fisiología , Animales , Apoptosis , Encéfalo/embriología , Linaje de la Célula , Movimiento Celular , Biología Computacional , Simulación por Computador , Desarrollo Embrionario , Cresta Neural/embriología , Placa Neural/embriología , Transporte de Proteínas , Transducción de Señal , Programas Informáticos , Procesos Estocásticos , Pez Cebra/embriología , beta Catenina/fisiologíaRESUMEN
The notochord defines the axial structure of all vertebrates during development. Notogenesis is a result of major cell reorganization in the mesoderm, the convergence and the extension of the axial cells. However, it is currently not fully understood how these processes act together in a coordinated way during notochord formation. The prechordal plate is an actively migrating cell population in the central mesoderm anterior to the trailing notochordal plate cells. We show that prechordal plate cells express Protocadherin 18a (Pcdh18a), a member of the cadherin superfamily. We find that Pcdh18a-mediated recycling of E-cadherin adhesion complexes transforms prechordal plate cells into a cohesive and fast migrating cell group. In turn, the prechordal plate cells subsequently instruct the trailing mesoderm. We simulated cell migration during early mesoderm formation using a lattice-based mathematical framework and predicted that the requirement for an anterior, local motile cell cluster could guide the intercalation and extension of the posterior, axial cells. Indeed, a grafting experiment validated the prediction and local Pcdh18a expression induced an ectopic prechordal plate-like cell group migrating towards the animal pole. Our findings indicate that the Pcdh18a is important for prechordal plate formation, which influences the trailing mesodermal cell sheet by orchestrating the morphogenesis of the notochord.
Asunto(s)
Cadherinas/metabolismo , Mesodermo/metabolismo , Pez Cebra/embriología , Animales , Cadherinas/genética , Endocitosis , Células HeLa , Humanos , Mesodermo/citología , Mutación , Células Tumorales CultivadasRESUMEN
Fully understanding biomolecular function requires detailed insight into the systems' structural dynamics. Powerful experimental techniques such as single molecule Förster Resonance Energy Transfer (FRET) provide access to such dynamic information yet have to be carefully interpreted. Molecular simulations can complement these experiments but typically face limits in accessing slow time scales and large or unstructured systems. Here, we introduce a coarse-grained simulation technique that tackles these challenges. While requiring only few parameters, we maintain full protein flexibility and include all heavy atoms of proteins, linkers, and dyes. We are able to sufficiently reduce computational demands to simulate large or heterogeneous structural dynamics and ensembles on slow time scales found in, e.g., protein folding. The simulations allow for calculating FRET efficiencies which quantitatively agree with experimentally determined values. By providing atomically resolved trajectories, this work supports the planning and microscopic interpretation of experiments. Overall, these results highlight how simulations and experiments can complement each other leading to new insights into biomolecular dynamics and function.
Asunto(s)
Colorantes/química , Transferencia Resonante de Energía de Fluorescencia/métodos , Proteínas/química , Simulación por Computador , Pliegue de ProteínaRESUMEN
Paracrine Wnt/ß-catenin signalling is important during developmental processes, tissue regeneration and stem cell regulation. Wnt proteins are morphogens, which form concentration gradients across responsive tissues. Little is known about the transport mechanism for these lipid-modified signalling proteins in vertebrates. Here we show that Wnt8a is transported on actin-based filopodia to contact responding cells and activate signalling during neural plate formation in zebrafish. Cdc42/N-Wasp regulates the formation of these Wnt-positive filopodia. Enhanced formation of filopodia increases the effective signalling range of Wnt by facilitating spreading. Consistently, reduction in filopodia leads to a restricted distribution of the ligand and a limited signalling range. Using a simulation, we provide evidence that such a short-range transport system for Wnt has a long-range signalling function. Indeed, we show that a filopodia-based transport system for Wnt8a controls anteroposterior patterning of the neural plate during vertebrate gastrulation.
Asunto(s)
Tipificación del Cuerpo/fisiología , Proteínas del Citoesqueleto/metabolismo , Placa Neural/embriología , Seudópodos/fisiología , Transducción de Señal/fisiología , Proteínas Wnt/metabolismo , Proteínas de Pez Cebra/metabolismo , Pez Cebra/embriología , Animales , Simulación por Computador , Fibroblastos/metabolismo , Células HEK293 , Humanos , Hibridación in Situ , Ratones , Microscopía Confocal , Oligonucleótidos Antisentido/genética , Plásmidos/genética , Transporte de Proteínas/fisiología , Reacción en Cadena en Tiempo Real de la Polimerasa , Proteína de Unión al GTP cdc42/metabolismoRESUMEN
The full characterization of protein folding is a remarkable long-standing challenge both for experiment and simulation. Working towards a complete understanding of this process, one needs to cover the full diversity of existing folds and identify the general principles driving the process. Here, we want to understand and quantify the diversity in folding routes for a large and representative set of protein topologies covering the full range from all alpha helical topologies towards beta barrels guided by the key question: Does the majority of the observed routes contribute to the folding process or only a particular route? We identified a set of two-state folders among non-homologous proteins with a sequence length of 40-120 residues. For each of these proteins, we ran native-structure based simulations both with homogeneous and heterogeneous contact potentials. For each protein, we simulated dozens of folding transitions in continuous uninterrupted simulations and constructed a large database of kinetic parameters. We investigate folding routes by tracking the formation of tertiary structure interfaces and discuss whether a single specific route exists for a topology or if all routes are equiprobable. These results permit us to characterize the complete folding space for small proteins in terms of folding barrier ΔG(), number of routes, and the route specificity RT.
Asunto(s)
Pliegue de Proteína , Proteínas/química , Cinética , Simulación de Dinámica Molecular , Estructura Terciaria de Proteína , TermodinámicaRESUMEN
BACKGROUND: Molecular dynamics (MD) simulations provide valuable insight into biomolecular systems at the atomic level. Notwithstanding the ever-increasing power of high performance computers current MD simulations face several challenges: the fastest atomic movements require time steps of a few femtoseconds which are small compared to biomolecular relevant timescales of milliseconds or even seconds for large conformational motions. At the same time, scalability to a large number of cores is limited mostly due to long-range interactions. An appealing alternative to atomic-level simulations is coarse-graining the resolution of the system or reducing the complexity of the Hamiltonian to improve sampling while decreasing computational costs. Native structure-based models, also called Go-type models, are based on energy landscape theory and the principle of minimal frustration. They have been tremendously successful in explaining fundamental questions of, e.g., protein folding, RNA folding or protein function. At the same time, they are computationally sufficiently inexpensive to run complex simulations on smaller computing systems or even commodity hardware. Still, their setup and evaluation is quite complex even though sophisticated software packages support their realization. RESULTS: Here, we establish an efficient infrastructure for native structure-based models to support the community and enable high-throughput simulations on remote computing resources via GridBeans and UNICORE middleware. This infrastructure organizes the setup of such simulations resulting in increased comparability of simulation results. At the same time, complete workflows for advanced simulation protocols can be established and managed on remote resources by a graphical interface which increases reusability of protocols and additionally lowers the entry barrier into such simulations for, e.g., experimental scientists who want to compare their results against simulations. We demonstrate the power of this approach by illustrating it for protein folding simulations for a range of proteins. CONCLUSIONS: We present software enhancing the entire workflow for native structure-based simulations including exception-handling and evaluations. Extending the capability and improving the accessibility of existing simulation packages the software goes beyond the state of the art in the domain of biomolecular simulations. Thus we expect that it will stimulate more individuals from the community to employ more confidently modeling in their research.
Asunto(s)
Simulación de Dinámica Molecular , Proteínas/química , Programas Informáticos , Gráficos por Computador , Pliegue de Proteína , Interfaz Usuario-ComputadorRESUMEN
MOTIVATION: Molecular dynamics simulations provide detailed insights into the structure and function of biomolecular systems. Thus, they complement experimental measurements by giving access to experimentally inaccessible regimes. Among the different molecular dynamics techniques, native structure-based models (SBMs) are based on energy landscape theory and the principle of minimal frustration. Typically used in protein and RNA folding simulations, they coarse-grain the biomolecular system and/or simplify the Hamiltonian resulting in modest computational requirements while achieving high agreement with experimental data. eSBMTools streamlines running and evaluating SBM in a comprehensive package and offers high flexibility in adding experimental- or bioinformatics-derived restraints. RESULTS: We present a software package that allows setting up, modifying and evaluating SBM for both RNA and proteins. The implemented workflows include predicting protein complexes based on bioinformatics-derived inter-protein contact information, a standardized setup of protein folding simulations based on the common PDB format, calculating reaction coordinates and evaluating the simulation by free-energy calculations with weighted histogram analysis method or by phi-values. The modules interface with the molecular dynamics simulation program GROMACS. The package is open source and written in architecture-independent Python2. AVAILABILITY: http://sourceforge.net/projects/esbmtools/. CONTACT: alexander.schug@kit.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.