Pesquisa | BVS Aleitamento Materno

1.

A Generative Angular Model of Protein Structure Evolution.

Golden, Michael; García-Portugués, Eduardo; Sørensen, Michael; Mardia, Kanti V; Hamelryck, Thomas; Hein, Jotun.

Mol Biol Evol ; 34(8): 2085-2100, 2017 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-28453724

RESUMO

Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both "smooth" conformational changes and "catastrophic" conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence-structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof.

Assuntos

Proteínas/genética , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Simulação por Computador , Evolução Molecular , Modelos Genéticos , Modelos Moleculares , Conformação Proteica , Elementos Estruturais de Proteínas/genética , Proteínas/metabolismo , Análise de Sequência de Proteína/estatística & dados numéricos

2.

Equilibrium simulations of proteins using molecular fragment replacement and NMR chemical shifts.

Boomsma, Wouter; Tian, Pengfei; Frellsen, Jes; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas; Lindorff-Larsen, Kresten; Vendruscolo, Michele.

Proc Natl Acad Sci U S A ; 111(38): 13852-7, 2014 Sep 23.

Artigo em Inglês | MEDLINE | ID: mdl-25192938

RESUMO

Methods of protein structure determination based on NMR chemical shifts are becoming increasingly common. The most widely used approaches adopt the molecular fragment replacement strategy, in which structural fragments are repeatedly reassembled into different complete conformations in molecular simulations. Although these approaches are effective in generating individual structures consistent with the chemical shift data, they do not enable the sampling of the conformational space of proteins with correct statistical weights. Here, we present a method of molecular fragment replacement that makes it possible to perform equilibrium simulations of proteins, and hence to determine their free energy landscapes. This strategy is based on the encoding of the chemical shift information in a probabilistic model in Markov chain Monte Carlo simulations. First, we demonstrate that with this approach it is possible to fold proteins to their native states starting from extended structures. Second, we show that the method satisfies the detailed balance condition and hence it can be used to carry out an equilibrium sampling from the Boltzmann distribution corresponding to the force field used in the simulations. Third, by comparing the results of simulations carried out with and without chemical shift restraints we describe quantitatively the effects that these restraints have on the free energy landscapes of proteins. Taken together, these results demonstrate that the molecular fragment replacement strategy can be used in combination with chemical shift information to characterize not only the native structures of proteins but also their conformational fluctuations.

Assuntos

Simulação por Computador , Modelos Moleculares , Ressonância Magnética Nuclear Biomolecular/métodos , Proteínas/química , Cadeias de Markov

3.

Formulation of probabilistic models of protein structure in atomic detail using the reference ratio method.

Valentin, Jan B; Andreetta, Christian; Boomsma, Wouter; Bottaro, Sandro; Ferkinghoff-Borg, Jesper; Frellsen, Jes; Mardia, Kanti V; Tian, Pengfei; Hamelryck, Thomas.

Proteins ; 82(2): 288-99, 2014 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-23934827

RESUMO

We propose a method to formulate probabilistic models of protein structure in atomic detail, for a given amino acid sequence, based on Bayesian principles, while retaining a close link to physics. We start from two previously developed probabilistic models of protein structure on a local length scale, which concern the dihedral angles in main chain and side chains, respectively. Conceptually, this constitutes a probabilistic and continuous alternative to the use of discrete fragment and rotamer libraries. The local model is combined with a nonlocal model that involves a small number of energy terms according to a physical force field, and some information on the overall secondary structure content. In this initial study we focus on the formulation of the joint model and the evaluation of the use of an energy vector as a descriptor of a protein's nonlocal structure; hence, we derive the parameters of the nonlocal model from the native structure without loss of generality. The local and nonlocal models are combined using the reference ratio method, which is a well-justified probabilistic construction. For evaluation, we use the resulting joint models to predict the structure of four proteins. The results indicate that the proposed method and the probabilistic models show considerable promise for probabilistic protein structure prediction and related applications.

Assuntos

Modelos Moleculares , Modelos Estatísticos , Algoritmos , Sequência de Aminoácidos , Proteínas de Bactérias/química , Teorema de Bayes , Ligação de Hidrogênio , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Homologia Estrutural de Proteína , Termodinâmica

4.

A simple probabilistic model of multibody interactions in proteins.

Johansson, Kristoffer Enøe; Hamelryck, Thomas.

Proteins ; 81(8): 1340-50, 2013 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-23468247

RESUMO

Protein structure prediction methods typically use statistical potentials, which rely on statistics derived from a database of know protein structures. In the vast majority of cases, these potentials involve pairwise distances or contacts between amino acids or atoms. Although some potentials beyond pairwise interactions have been described, the formulation of a general multibody potential is seen as intractable due to the perceived limited amount of data. In this article, we show that it is possible to formulate a probabilistic model of higher order interactions in proteins, without arbitrarily limiting the number of contacts. The success of this approach is based on replacing a naive table-based approach with a simple hierarchical model involving suitable probability distributions and conditional independence assumptions. The model captures the joint probability distribution of an amino acid and its neighbors, local structure and solvent exposure. We show that this model can be used to approximate the conditional probability distribution of an amino acid sequence given a structure using a pseudo-likelihood approach. We verify the model by decoy recognition and site-specific amino acid predictions. Our coarse-grained model is compared to state-of-art methods that use full atomic detail. This article illustrates how the use of simple probabilistic models can lead to new opportunities in the treatment of nonlocal interactions in knowledge-based protein structure prediction and design.

Assuntos

Proteínas/metabolismo , Teorema de Bayes , Humanos , Modelos Biológicos , Modelos Moleculares , Modelos Estatísticos , Conformação Proteica , Mapas de Interação de Proteínas , Proteínas/química , Ubiquitina/química , Ubiquitina/metabolismo

5.

PHAISTOS: a framework for Markov chain Monte Carlo simulation and inference of protein structure.

Boomsma, Wouter; Frellsen, Jes; Harder, Tim; Bottaro, Sandro; Johansson, Kristoffer E; Tian, Pengfei; Stovgaard, Kasper; Andreetta, Christian; Olsson, Simon; Valentin, Jan B; Antonov, Lubomir D; Christensen, Anders S; Borg, Mikael; Jensen, Jan H; Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas.

J Comput Chem ; 34(19): 1697-705, 2013 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-23619610

RESUMO

We present a new software framework for Markov chain Monte Carlo sampling for simulation, prediction, and inference of protein structure. The software package contains implementations of recent advances in Monte Carlo methodology, such as efficient local updates and sampling from probabilistic models of local protein structure. These models form a probabilistic alternative to the widely used fragment and rotamer libraries. Combined with an easily extendible software architecture, this makes PHAISTOS well suited for Bayesian inference of protein structure from sequence and/or experimental data. Currently, two force-fields are available within the framework: PROFASI and OPLS-AA/L, the latter including the generalized Born surface area solvent model. A flexible command-line and configuration-file interface allows users quickly to set up simulations with the desired configuration. PHAISTOS is released under the GNU General Public License v3.0. Source code and documentation are freely available from http://phaistos.sourceforge.net. The software is implemented in C++ and has been tested on Linux and OSX platforms.

Assuntos

Cadeias de Markov , Método de Monte Carlo , Proteínas/química , Software , Teorema de Bayes , Simulação por Computador , Modelos Químicos , Conformação Proteica

6.

Fast large-scale clustering of protein structures using Gauss integrals.

Harder, Tim; Borg, Mikael; Boomsma, Wouter; Røgen, Peter; Hamelryck, Thomas.

Bioinformatics ; 28(4): 510-5, 2012 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-22199383

RESUMO

MOTIVATION: Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. RESULTS: We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors--which were introduced by Røgen and co-workers--and subsequently performing K-means clustering. CONCLUSIONS: Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50,000 structures, can be clustered within seconds to minutes.

Assuntos

Análise por Conglomerados , Biologia Computacional/métodos , Proteínas/química , Adenilato Quinase/química , Candida/química , Escherichia coli/enzimologia , Proteínas Fúngicas/química , Simulação de Dinâmica Molecular

7.

A generative, probabilistic model of local protein structure.

Boomsma, Wouter; Mardia, Kanti V; Taylor, Charles C; Ferkinghoff-Borg, Jesper; Krogh, Anders; Hamelryck, Thomas.

Proc Natl Acad Sci U S A ; 105(26): 8932-7, 2008 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-18579771

RESUMO

Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence-structure correlations in the native state. Our method represents a significant theoretical and practical improvement over the widely used fragment assembly technique by avoiding the drawbacks associated with a discrete and nonprobabilistic approach.

Assuntos

Modelos Moleculares , Modelos Estatísticos , Proteínas/química , Motivos de Aminoácidos

8.

Mocapy++--a toolkit for inference and learning in dynamic Bayesian networks.

Paluszewski, Martin; Hamelryck, Thomas.

BMC Bioinformatics ; 11: 126, 2010 Mar 12.

Artigo em Inglês | MEDLINE | ID: mdl-20226024

RESUMO

BACKGROUND: Mocapy++ is a toolkit for parameter learning and inference in dynamic Bayesian networks (DBNs). It supports a wide range of DBN architectures and probability distributions, including distributions from directional statistics (the statistics of angles, directions and orientations). RESULTS: The program package is freely available under the GNU General Public Licence (GPL) from SourceForge http://sourceforge.net/projects/mocapy. The package contains the source for building the Mocapy++ library, several usage examples and the user manual. CONCLUSIONS: Mocapy++ is especially suitable for constructing probabilistic models of biomolecular structure, due to its support for directional statistics. In particular, it supports the Kent distribution on the sphere and the bivariate von Mises distribution on the torus. These distributions have proven useful to formulate probabilistic models of protein and RNA structure in atomic detail.

Assuntos

Teorema de Bayes , Software , Modelos Estatísticos , Conformação Proteica , Proteínas/química , RNA/química

9.

Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models.

Stovgaard, Kasper; Andreetta, Christian; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas.

BMC Bioinformatics ; 11: 429, 2010 Aug 18.

Artigo em Inglês | MEDLINE | ID: mdl-20718956

RESUMO

BACKGROUND: Genome sequencing projects have expanded the gap between the amount of known protein sequences and structures. The limitations of current high resolution structure determination methods make it unlikely that this gap will disappear in the near future. Small angle X-ray scattering (SAXS) is an established low resolution method for routinely determining the structure of proteins in solution. The purpose of this study is to develop a method for the efficient calculation of accurate SAXS curves from coarse-grained protein models. Such a method can for example be used to construct a likelihood function, which is paramount for structure determination based on statistical inference. RESULTS: We present a method for the efficient calculation of accurate SAXS curves based on the Debye formula and a set of scattering form factors for dummy atom representations of amino acids. Such a method avoids the computationally costly iteration over all atoms. We estimated the form factors using generated data from a set of high quality protein structures. No ad hoc scaling or correction factors are applied in the calculation of the curves. Two coarse-grained representations of protein structure were investigated; two scattering bodies per amino acid led to significantly better results than a single scattering body. CONCLUSION: We show that the obtained point estimates allow the calculation of accurate SAXS curves from coarse-grained protein models. The resulting curves are on par with the current state-of-the-art program CRYSOL, which requires full atomic detail. Our method was also comparable to CRYSOL in recognizing native structures among native-like decoys. As a proof-of-concept, we combined the coarse-grained Debye calculation with a previously described probabilistic model of protein structure, TorusDBN. This resulted in a significant improvement in the decoy recognition performance. In conclusion, the presented method shows great promise for use in statistical inference of protein structures from SAXS data.

Assuntos

Proteínas/química , Espalhamento a Baixo Ângulo , Sequência de Aminoácidos , Modelos Moleculares , Modelos Estatísticos , Conformação Proteica , Soluções , Difração de Raios X

10.

Beyond rotamers: a generative, probabilistic model of side chains in proteins.

Harder, Tim; Boomsma, Wouter; Paluszewski, Martin; Frellsen, Jes; Johansson, Kristoffer E; Hamelryck, Thomas.

BMC Bioinformatics ; 11: 306, 2010 Jun 05.

Artigo em Inglês | MEDLINE | ID: mdl-20525384

RESUMO

BACKGROUND: Accurately covering the conformational space of amino acid side chains is essential for important applications such as protein design, docking and high resolution structure prediction. Today, the most common way to capture this conformational space is through rotamer libraries - discrete collections of side chain conformations derived from experimentally determined protein structures. The discretization can be exploited to efficiently search the conformational space. However, discretizing this naturally continuous space comes at the cost of losing detailed information that is crucial for certain applications. For example, rigorously combining rotamers with physical force fields is associated with numerous problems. RESULTS: In this work we present BASILISK: a generative, probabilistic model of the conformational space of side chains that makes it possible to sample in continuous space. In addition, sampling can be conditional upon the protein's detailed backbone conformation, again in continuous space - without involving discretization. CONCLUSIONS: A careful analysis of the model and a comparison with various rotamer libraries indicates that the model forms an excellent, fully continuous model of side chain conformational space. We also illustrate how the model can be used for rigorous, unbiased sampling with a physical force field, and how it improves side chain prediction when used as a pseudo-energy term. In conclusion, BASILISK is an important step forward on the way to a rigorous probabilistic description of protein structure in continuous space and in atomic detail.

Assuntos

Modelos Estatísticos , Proteínas/química , Modelos Moleculares , Conformação Proteica

11.

Biopython: freely available Python tools for computational molecular biology and bioinformatics.

Cock, Peter J A; Antao, Tiago; Chang, Jeffrey T; Chapman, Brad A; Cox, Cymon J; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J L.

Bioinformatics ; 25(11): 1422-3, 2009 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-19304878

RESUMO

SUMMARY: The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. AVAILABILITY: Biopython is freely available, with documentation and source code at (www.biopython.org) under the Biopython license.

Assuntos

Biologia Computacional/métodos , Software , Bases de Dados Factuais , Internet , Linguagens de Programação

12.

A probabilistic model of RNA conformational space.

Frellsen, Jes; Moltke, Ida; Thiim, Martin; Mardia, Kanti V; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas.

PLoS Comput Biol ; 5(6): e1000406, 2009 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-19543381

RESUMO

The increasing importance of non-coding RNA in biology and medicine has led to a growing interest in the problem of RNA 3-D structure prediction. As is the case for proteins, RNA 3-D structure prediction methods require two key ingredients: an accurate energy function and a conformational sampling procedure. Both are only partly solved problems. Here, we focus on the problem of conformational sampling. The current state of the art solution is based on fragment assembly methods, which construct plausible conformations by stringing together short fragments obtained from experimental structures. However, the discrete nature of the fragments necessitates the use of carefully tuned, unphysical energy functions, and their non-probabilistic nature impairs unbiased sampling. We offer a solution to the sampling problem that removes these important limitations: a probabilistic model of RNA structure that allows efficient sampling of RNA conformations in continuous space, and with associated probabilities. We show that the model captures several key features of RNA structure, such as its rotameric nature and the distribution of the helix lengths. Furthermore, the model readily generates native-like 3-D conformations for 9 out of 10 test structures, solely using coarse-grained base-pairing information. In conclusion, the method provides a theoretical and practical solution for a major bottleneck on the way to routine prediction and simulation of RNA structure and dynamics in atomic detail.

Assuntos

Modelos Estatísticos , Conformação de Ácido Nucleico , RNA/química , Algoritmos , Teorema de Bayes , Simulação por Computador , Bases de Dados de Ácidos Nucleicos , Imageamento Tridimensional/métodos , Cadeias de Markov , Modelos Moleculares , Método de Monte Carlo , Software

13.

GISA: using Gauss Integrals to identify rare conformations in protein structures.

Grønbæk, Christian; Hamelryck, Thomas; Røgen, Peter.

PeerJ ; 8: e9159, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32566389

RESUMO

The native structure of a protein is important for its function, and therefore methods for exploring protein structures have attracted much research. However, rather few methods are sensitive to topologic-geometric features, the examples being knots, slipknots, lassos, links, and pokes, and with each method aimed only for a specific set of such configurations. We here propose a general method which transforms a structure into a "fingerprint of topological-geometric values" consisting in a series of real-valued descriptors from mathematical Knot Theory. The extent to which a structure contains unusual configurations can then be judged from this fingerprint. The method is not confined to a particular pre-defined topology or geometry (like a knot or a poke), and so, unlike existing methods, it is general. To achieve this our new algorithm, GISA, as a key novelty produces the descriptors, so called Gauss integrals, not only for the full chains of a protein but for all its sub-chains. This allows fingerprinting on any scale from local to global. The Gauss integrals are known to be effective descriptors of global protein folds. Applying GISA to sets of several thousand high resolution structures, we first show how the most basic Gauss integral, the writhe, enables swift identification of pre-defined geometries such as pokes and links. We then apply GISA with no restrictions on geometry, to show how it allows identifying rare conformations by finding rare invariant values only. In this unrestricted search, pokes and links are still found, but also knotted conformations, as well as more highly entangled configurations not previously described. Thus, an application of the basic scan method in GISA's tool-box revealed 10 known cases of knots as the top positive writhe cases, while placing at the top of the negative writhe 14 cases in cis-trans isomerases sharing a spatial motif of little secondary structure content, which possibly has gone unnoticed. Possible general applications of GISA are fold classification and structural alignment based on local Gauss integrals. Others include finding errors in protein models and identifying unusual conformations that might be important for protein folding and function. By its broad potential, we believe that GISA will be of general benefit to the structural bioinformatics community. GISA is coded in C and comes as a command line tool. Source and compiled code for GISA plus read-me and examples are publicly available at GitHub (https://github.com).

14.

A community proposal to integrate structural bioinformatics activities in ELIXIR (3D-Bioinfo Community).

Orengo, Christine; Velankar, Sameer; Wodak, Shoshana; Zoete, Vincent; Bonvin, Alexandre M J J; Elofsson, Arne; Feenstra, K Anton; Gerloff, Dietland L; Hamelryck, Thomas; Hancock, John M; Helmer-Citterich, Manuela; Hospital, Adam; Orozco, Modesto; Perrakis, Anastassis; Rarey, Matthias; Soares, Claudio; Sussman, Joel L; Thornton, Janet M; Tuffery, Pierre; Tusnady, Gabor; Wierenga, Rikkert; Salminen, Tiina; Schneider, Bohdan.

F1000Res ; 92020.

Artigo em Inglês | MEDLINE | ID: mdl-32566135

RESUMO

Structural bioinformatics provides the scientific methods and tools to analyse, archive, validate, and present the biomolecular structure data generated by the structural biology community. It also provides an important link with the genomics community, as structural bioinformaticians also use the extensive sequence data to predict protein structures and their functional sites. A very broad and active community of structural bioinformaticians exists across Europe, and 3D-Bioinfo will establish formal platforms to address their needs and better integrate their activities and initiatives. Our mission will be to strengthen the ties with the structural biology research communities in Europe covering life sciences, as well as chemistry and physics and to bridge the gap between these researchers in order to fully realize the potential of structural bioinformatics. Our Community will also undertake dedicated educational, training and outreach efforts to facilitate this, bringing new insights and thus facilitating the development of much needed innovative applications e.g. for human health, drug and protein design. Our combined efforts will be of critical importance to keep the European research efforts competitive in this respect. Here we highlight the major European contributions to the field of structural bioinformatics, the most pressing challenges remaining and how Europe-wide interactions, enabled by ELIXIR and its platforms, will help in addressing these challenges and in coordinating structural bioinformatics resources across Europe. In particular, we present recent activities and future plans to consolidate an ELIXIR 3D-Bioinfo Community in structural bioinformatics and propose means to develop better links across the community. These include building new consortia, organising workshops to establish data standards and seeking community agreement on benchmark data sets and strategies. We also highlight existing and planned collaborations with other ELIXIR Communities and other European infrastructures, such as the structural biology community supported by Instruct-ERIC, with whom we have synergies and overlapping common interests.

Assuntos

Disciplinas das Ciências Biológicas , Biologia Computacional/organização & administração , Europa (Continente) , Genômica , Humanos , Proteínas

15.

Probabilistic models and machine learning in structural bioinformatics.

Hamelryck, Thomas.

Stat Methods Med Res ; 18(5): 505-26, 2009 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-19153168

RESUMO

Structural bioinformatics is concerned with the molecular structure of biomacromolecules on a genomic scale, using computational methods. Classic problems in structural bioinformatics include the prediction of protein and RNA structure from sequence, the design of artificial proteins or enzymes, and the automated analysis and comparison of biomacromolecules in atomic detail. The determination of macromolecular structure from experimental data (for example coming from nuclear magnetic resonance, X-ray crystallography or small angle X-ray scattering) has close ties with the field of structural bioinformatics. Recently, probabilistic models and machine learning methods based on Bayesian principles are providing efficient and rigorous solutions to challenging problems that were long regarded as intractable. In this review, I will highlight some important recent developments in the prediction, analysis and experimental determination of macromolecular structure that are based on such methods. These developments include generative models of protein structure, the estimation of the parameters of energy functions that are used in structure prediction, the superposition of macromolecules and structure determination methods that are based on inference. Although this review is not exhaustive, I believe the selected topics give a good impression of the exciting new, probabilistic road the field of structural bioinformatics is taking.

Assuntos

Inteligência Artificial , Biologia Computacional , Modelos Estatísticos , Proteínas/química , Humanos , Modelos Moleculares

16.

A Probabilistic Programming Approach to Protein Structure Superposition.

Moreta, Lys Sanz; Al-Sibahi, Ahmad Salim; Theobald, Douglas; Bullock, William; Rommes, Basile Nicolas; Manoukian, Andreas; Hamelryck, Thomas.

Proc IEEE Symp Comput Intell Bioinforma Comput Biol ; 20192019 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-34661202

RESUMO

Optimal superposition of protein structures or other biological molecules is crucial for understanding their structure, function, dynamics and evolution. Here, we investigate the use of probabilistic programming to superimpose protein structures guided by a Bayesian model. Our model THESEUS-PP is based on the THESEUS model, a probabilistic model of protein superposition based on rotation, translation and perturbation of an underlying, latent mean structure. The model was implemented in the probabilistic programming language Pyro. Unlike conventional methods that minimize the sum of the squared distances, THESEUS takes into account correlated atom positions and heteroscedasticity (ie. atom positions can feature different variances). THESEUS performs maximum likelihood estimation using iterative expectation-maximization. In contrast, THESEUS-PP allows automated maximum a-posteriori (MAP) estimation using suitable priors over rotation, translation, variances and latent mean structure. The results indicate that probabilistic programming is a powerful new paradigm for the formulation of Bayesian probabilistic models concerning biomolecular structure. Specifically, we envision the use of the THESEUS-PP model as a suitable error model or likelihood in Bayesian protein structure prediction using deep probabilistic programming.

17.

MyPMFs: a simple tool for creating statistical potentials to assess protein structural models.

Postic, Guillaume; Hamelryck, Thomas; Chomilier, Jacques; Stratmann, Dirk.

Biochimie ; 151: 37-41, 2018 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-29857183

RESUMO

Evaluating the model quality of protein structures that evolve in environments with particular physicochemical properties requires scoring functions that are adapted to their specific residue compositions and/or structural characteristics. Thus, computational methods developed for structures from the cytosol cannot work properly on membrane or secreted proteins. Here, we present MyPMFs, an easy-to-use tool that allows users to train statistical potentials of mean force (PMFs) on the protein structures of their choice, with all parameters being adjustable. We demonstrate its use by creating an accurate statistical potential for transmembrane protein domains. We also show its usefulness to study the influence of the physical environment on residue interactions within protein structures. Our open-source software is freely available for download at https://github.com/bibip-impmc/mypmfs.

Assuntos

Biologia Computacional/métodos , Modelos Moleculares , Conformação Proteica , Proteínas/química , Aminoácidos/química

18.

An evolutionary method for learning HMM structure: prediction of protein secondary structure.

Won, Kyoung-Jae; Hamelryck, Thomas; Prügel-Bennett, Adam; Krogh, Anders.

BMC Bioinformatics ; 8: 357, 2007 Sep 21.

Artigo em Inglês | MEDLINE | ID: mdl-17888163

RESUMO

BACKGROUND: The prediction of the secondary structure of proteins is one of the most studied problems in bioinformatics. Despite their success in many problems of biological sequence analysis, Hidden Markov Models (HMMs) have not been used much for this problem, as the complexity of the task makes manual design of HMMs difficult. Therefore, we have developed a method for evolving the structure of HMMs automatically, using Genetic Algorithms (GAs). RESULTS: In the GA procedure, populations of HMMs are assembled from biologically meaningful building blocks. Mutation and crossover operators were designed to explore the space of such Block-HMMs. After each step of the GA, the standard HMM estimation algorithm (the Baum-Welch algorithm) was used to update model parameters. The final HMM captures several features of protein sequence and structure, with its own HMM grammar. In contrast to neural network based predictors, the evolved HMM also calculates the probabilities associated with the predictions. We carefully examined the performance of the HMM based predictor, both under the multiple- and single-sequence condition. CONCLUSION: We have shown that the proposed evolutionary method can automatically design the topology of HMMs. The method reads the grammar of protein sequences and converts it into the grammar of an HMM. It improved previously suggested evolutionary methods and increased the prediction quality. Especially, it shows good performance under the single-sequence condition and provides probabilistic information on the prediction result. The protein secondary structure predictor using HMMs (P.S.HMM) is on-line available http://www.binf.ku.dk/~won/pshmm.htm. It runs under the single-sequence condition.

Assuntos

Algoritmos , Modelos Químicos , Modelos Moleculares , Estrutura Secundária de Proteína , Proteínas/química , Proteínas/ultraestrutura , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Simulação por Computador , Cadeias de Markov , Dados de Sequência Molecular

19.

Sampling realistic protein conformations using local structural bias.

Hamelryck, Thomas; Kent, John T; Krogh, Anders.

PLoS Comput Biol ; 2(9): e131, 2006 Sep 22.

Artigo em Inglês | MEDLINE | ID: mdl-17002495

RESUMO

The prediction of protein structure from sequence remains a major unsolved problem in biology. The most successful protein structure prediction methods make use of a divide-and-conquer strategy to attack the problem: a conformational sampling method generates plausible candidate structures, which are subsequently accepted or rejected using an energy function. Conceptually, this often corresponds to separating local structural bias from the long-range interactions that stabilize the compact, native state. However, sampling protein conformations that are compatible with the local structural bias encoded in a given protein sequence is a long-standing open problem, especially in continuous space. We describe an elegant and mathematically rigorous method to do this, and show that it readily generates native-like protein conformations simply by enforcing compactness. Our results have far-reaching implications for protein structure prediction, determination, simulation, and design.

Assuntos

Biologia Computacional , Proteínas/química , Sequência de Aminoácidos , Simulação por Computador , Modelos Moleculares , Probabilidade , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína

20.

Computational Redesign of Thioredoxin Is Hypersensitive toward Minor Conformational Changes in the Backbone Template.

Johansson, Kristoffer E; Tidemand Johansen, Nicolai; Christensen, Signe; Horowitz, Scott; Bardwell, James C A; Olsen, Johan G; Willemoës, Martin; Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas; Winther, Jakob R.

J Mol Biol ; 428(21): 4361-4377, 2016 10 23.

Artigo em Inglês | MEDLINE | ID: mdl-27659562

RESUMO

Despite the development of powerful computational tools, the full-sequence design of proteins still remains a challenging task. To investigate the limits and capabilities of computational tools, we conducted a study of the ability of the program Rosetta to predict sequences that recreate the authentic fold of thioredoxin. Focusing on the influence of conformational details in the template structures, we based our study on 8 experimentally determined template structures and generated 120 designs from each. For experimental evaluation, we chose six sequences from each of the eight templates by objective criteria. The 48 selected sequences were evaluated based on their progressive ability to (1) produce soluble protein in Escherichia coli and (2) yield stable monomeric protein, and (3) on the ability of the stable, soluble proteins to adopt the target fold. Of the 48 designs, we were able to synthesize 32, 20 of which resulted in soluble protein. Of these, only two were sufficiently stable to be purified. An X-ray crystal structure was solved for one of the designs, revealing a close resemblance to the target structure. We found a significant difference among the eight template structures to realize the above three criteria despite their high structural similarity. Thus, in order to improve the success rate of computational full-sequence design methods, we recommend that multiple template structures are used. Furthermore, this study shows that special care should be taken when optimizing the geometry of a structure prior to computational design when using a method that is based on rigid conformations.

Assuntos

Dobramento de Proteína , Tiorredoxinas/química , Tiorredoxinas/metabolismo , Biologia Computacional , Cristalografia por Raios X , Conformação Proteica , Estabilidade Proteica , Solubilidade , Tiorredoxinas/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA