Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
F1000Res ; 92020.
Artículo en Inglés | MEDLINE | ID: mdl-32566135

RESUMEN

Structural bioinformatics provides the scientific methods and tools to analyse, archive, validate, and present the biomolecular structure data generated by the structural biology community. It also provides an important link with the genomics community, as structural bioinformaticians also use the extensive sequence data to predict protein structures and their functional sites. A very broad and active community of structural bioinformaticians exists across Europe, and 3D-Bioinfo will establish formal platforms to address their needs and better integrate their activities and initiatives. Our mission will be to strengthen the ties with the structural biology research communities in Europe covering life sciences, as well as chemistry and physics and to bridge the gap between these researchers in order to fully realize the potential of structural bioinformatics. Our Community will also undertake dedicated educational, training and outreach efforts to facilitate this, bringing new insights and thus facilitating the development of much needed innovative applications e.g. for human health, drug and protein design. Our combined efforts will be of critical importance to keep the European research efforts competitive in this respect. Here we highlight the major European contributions to the field of structural bioinformatics, the most pressing challenges remaining and how Europe-wide interactions, enabled by ELIXIR and its platforms, will help in addressing these challenges and in coordinating structural bioinformatics resources across Europe. In particular, we present recent activities and future plans to consolidate an ELIXIR 3D-Bioinfo Community in structural bioinformatics and propose means to develop better links across the community. These include building new consortia, organising workshops to establish data standards and seeking community agreement on benchmark data sets and strategies. We also highlight existing and planned collaborations with other ELIXIR Communities and other European infrastructures, such as the structural biology community supported by Instruct-ERIC, with whom we have synergies and overlapping common interests.


Asunto(s)
Disciplinas de las Ciencias Biológicas , Biología Computacional/organización & administración , Europa (Continente) , Genómica , Humanos , Proteínas
2.
PeerJ ; 8: e9159, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32566389

RESUMEN

The native structure of a protein is important for its function, and therefore methods for exploring protein structures have attracted much research. However, rather few methods are sensitive to topologic-geometric features, the examples being knots, slipknots, lassos, links, and pokes, and with each method aimed only for a specific set of such configurations. We here propose a general method which transforms a structure into a "fingerprint of topological-geometric values" consisting in a series of real-valued descriptors from mathematical Knot Theory. The extent to which a structure contains unusual configurations can then be judged from this fingerprint. The method is not confined to a particular pre-defined topology or geometry (like a knot or a poke), and so, unlike existing methods, it is general. To achieve this our new algorithm, GISA, as a key novelty produces the descriptors, so called Gauss integrals, not only for the full chains of a protein but for all its sub-chains. This allows fingerprinting on any scale from local to global. The Gauss integrals are known to be effective descriptors of global protein folds. Applying GISA to sets of several thousand high resolution structures, we first show how the most basic Gauss integral, the writhe, enables swift identification of pre-defined geometries such as pokes and links. We then apply GISA with no restrictions on geometry, to show how it allows identifying rare conformations by finding rare invariant values only. In this unrestricted search, pokes and links are still found, but also knotted conformations, as well as more highly entangled configurations not previously described. Thus, an application of the basic scan method in GISA's tool-box revealed 10 known cases of knots as the top positive writhe cases, while placing at the top of the negative writhe 14 cases in cis-trans isomerases sharing a spatial motif of little secondary structure content, which possibly has gone unnoticed. Possible general applications of GISA are fold classification and structural alignment based on local Gauss integrals. Others include finding errors in protein models and identifying unusual conformations that might be important for protein folding and function. By its broad potential, we believe that GISA will be of general benefit to the structural bioinformatics community. GISA is coded in C and comes as a command line tool. Source and compiled code for GISA plus read-me and examples are publicly available at GitHub (https://github.com).

3.
Artículo en Inglés | MEDLINE | ID: mdl-34661202

RESUMEN

Optimal superposition of protein structures or other biological molecules is crucial for understanding their structure, function, dynamics and evolution. Here, we investigate the use of probabilistic programming to superimpose protein structures guided by a Bayesian model. Our model THESEUS-PP is based on the THESEUS model, a probabilistic model of protein superposition based on rotation, translation and perturbation of an underlying, latent mean structure. The model was implemented in the probabilistic programming language Pyro. Unlike conventional methods that minimize the sum of the squared distances, THESEUS takes into account correlated atom positions and heteroscedasticity (ie. atom positions can feature different variances). THESEUS performs maximum likelihood estimation using iterative expectation-maximization. In contrast, THESEUS-PP allows automated maximum a-posteriori (MAP) estimation using suitable priors over rotation, translation, variances and latent mean structure. The results indicate that probabilistic programming is a powerful new paradigm for the formulation of Bayesian probabilistic models concerning biomolecular structure. Specifically, we envision the use of the THESEUS-PP model as a suitable error model or likelihood in Bayesian protein structure prediction using deep probabilistic programming.

4.
Biochimie ; 151: 37-41, 2018 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-29857183

RESUMEN

Evaluating the model quality of protein structures that evolve in environments with particular physicochemical properties requires scoring functions that are adapted to their specific residue compositions and/or structural characteristics. Thus, computational methods developed for structures from the cytosol cannot work properly on membrane or secreted proteins. Here, we present MyPMFs, an easy-to-use tool that allows users to train statistical potentials of mean force (PMFs) on the protein structures of their choice, with all parameters being adjustable. We demonstrate its use by creating an accurate statistical potential for transmembrane protein domains. We also show its usefulness to study the influence of the physical environment on residue interactions within protein structures. Our open-source software is freely available for download at https://github.com/bibip-impmc/mypmfs.


Asunto(s)
Biología Computacional/métodos , Modelos Moleculares , Conformación Proteica , Proteínas/química , Aminoácidos/química
5.
Mol Biol Evol ; 34(8): 2085-2100, 2017 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-28453724

RESUMEN

Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both "smooth" conformational changes and "catastrophic" conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence-structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof.


Asunto(s)
Proteínas/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Simulación por Computador , Evolución Molecular , Modelos Genéticos , Modelos Moleculares , Conformación Proteica , Elementos Estructurales de las Proteínas/genética , Proteínas/metabolismo , Análisis de Secuencia de Proteína/estadística & datos numéricos
6.
J Mol Biol ; 428(21): 4361-4377, 2016 10 23.
Artículo en Inglés | MEDLINE | ID: mdl-27659562

RESUMEN

Despite the development of powerful computational tools, the full-sequence design of proteins still remains a challenging task. To investigate the limits and capabilities of computational tools, we conducted a study of the ability of the program Rosetta to predict sequences that recreate the authentic fold of thioredoxin. Focusing on the influence of conformational details in the template structures, we based our study on 8 experimentally determined template structures and generated 120 designs from each. For experimental evaluation, we chose six sequences from each of the eight templates by objective criteria. The 48 selected sequences were evaluated based on their progressive ability to (1) produce soluble protein in Escherichia coli and (2) yield stable monomeric protein, and (3) on the ability of the stable, soluble proteins to adopt the target fold. Of the 48 designs, we were able to synthesize 32, 20 of which resulted in soluble protein. Of these, only two were sufficiently stable to be purified. An X-ray crystal structure was solved for one of the designs, revealing a close resemblance to the target structure. We found a significant difference among the eight template structures to realize the above three criteria despite their high structural similarity. Thus, in order to improve the success rate of computational full-sequence design methods, we recommend that multiple template structures are used. Furthermore, this study shows that special care should be taken when optimizing the geometry of a structure prior to computational design when using a method that is based on rigid conformations.


Asunto(s)
Pliegue de Proteína , Tiorredoxinas/química , Tiorredoxinas/metabolismo , Biología Computacional , Cristalografía por Rayos X , Conformación Proteica , Estabilidad Proteica , Solubilidad , Tiorredoxinas/genética
7.
PeerJ ; 3: e861, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25825683

RESUMEN

Protein chemical shifts are routinely used to augment molecular mechanics force fields in protein structure simulations, with weights of the chemical shift restraints determined empirically. These weights, however, might not be an optimal descriptor of a given protein structure and predictive model, and a bias is introduced which might result in incorrect structures. In the inferential structure determination framework, both the unknown structure and the disagreement between experimental and back-calculated data are formulated as a joint probability distribution, thus utilizing the full information content of the data. Here, we present the formulation of such a probability distribution where the error in chemical shift prediction is described by either a Gaussian or Cauchy distribution. The methodology is demonstrated and compared to a set of empirically weighted potentials through Markov chain Monte Carlo simulations of three small proteins (ENHD, Protein G and the SMN Tudor Domain) using the PROFASI force field and the chemical shift predictor CamShift. Using a clustering-criterion for identifying the best structure, together with the addition of a solvent exposure scoring term, the simulations suggests that sampling both the structure and the uncertainties in chemical shift prediction leads more accurate structures compared to conventional methods using empirical determined weights. The Cauchy distribution, using either sampled uncertainties or predetermined weights, did, however, result in overall better convergence to the native fold, suggesting that both types of distribution might be useful in different aspects of the protein structure prediction.

8.
Proc Natl Acad Sci U S A ; 111(38): 13852-7, 2014 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-25192938

RESUMEN

Methods of protein structure determination based on NMR chemical shifts are becoming increasingly common. The most widely used approaches adopt the molecular fragment replacement strategy, in which structural fragments are repeatedly reassembled into different complete conformations in molecular simulations. Although these approaches are effective in generating individual structures consistent with the chemical shift data, they do not enable the sampling of the conformational space of proteins with correct statistical weights. Here, we present a method of molecular fragment replacement that makes it possible to perform equilibrium simulations of proteins, and hence to determine their free energy landscapes. This strategy is based on the encoding of the chemical shift information in a probabilistic model in Markov chain Monte Carlo simulations. First, we demonstrate that with this approach it is possible to fold proteins to their native states starting from extended structures. Second, we show that the method satisfies the detailed balance condition and hence it can be used to carry out an equilibrium sampling from the Boltzmann distribution corresponding to the force field used in the simulations. Third, by comparing the results of simulations carried out with and without chemical shift restraints we describe quantitatively the effects that these restraints have on the free energy landscapes of proteins. Taken together, these results demonstrate that the molecular fragment replacement strategy can be used in combination with chemical shift information to characterize not only the native structures of proteins but also their conformational fluctuations.


Asunto(s)
Simulación por Computador , Modelos Moleculares , Resonancia Magnética Nuclear Biomolecular/métodos , Proteínas/química , Cadenas de Markov
9.
PeerJ ; 2: e277, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24688855

RESUMEN

We present a powerful Python library to quickly and efficiently generate realistic peptide model structures. The library makes it possible to quickly set up quantum mechanical calculations on model peptide structures. It is possible to manually specify a specific conformation of the peptide. Additionally the library also offers sampling of backbone conformations and side chain rotamer conformations from continuous distributions. The generated peptides can then be geometry optimized by the MMFF94 molecular mechanics force field via convenient functions inside the library. Finally, it is possible to output the resulting structures directly to files in a variety of useful formats, such as XYZ or PDB formats, or directly as input files for a quantum chemistry program. FragBuilder is freely available at https://github.com/jensengroup/fragbuilder/ under the terms of the BSD open source license.

10.
J Chem Theory Comput ; 10(8): 3484-91, 2014 Aug 12.
Artículo en Inglés | MEDLINE | ID: mdl-26588313

RESUMEN

The motions of biological macromolecules are tightly coupled to their functions. However, while the study of fast motions has become increasingly feasible in recent years, the study of slower, biologically important motions remains difficult. Here, we present a method to construct native state ensembles of proteins by the combination of physical force fields and experimental data through modern statistical methodology. As an example, we use NMR residual dipolar couplings to determine a native state ensemble of the extensively studied third immunoglobulin binding domain of protein G (GB3). The ensemble accurately describes both local and nonlocal backbone fluctuations as judged by its reproduction of complementary experimental data. While it is difficult to assess precise time-scales of the observed motions, our results suggest that it is possible to construct realistic conformational ensembles of biomolecules very efficiently. The approach may allow for a dramatic reduction in the computational as well as experimental resources needed to obtain accurate conformational ensembles of biological macromolecules in a statistically sound manner.

11.
Proteins ; 82(2): 288-99, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-23934827

RESUMEN

We propose a method to formulate probabilistic models of protein structure in atomic detail, for a given amino acid sequence, based on Bayesian principles, while retaining a close link to physics. We start from two previously developed probabilistic models of protein structure on a local length scale, which concern the dihedral angles in main chain and side chains, respectively. Conceptually, this constitutes a probabilistic and continuous alternative to the use of discrete fragment and rotamer libraries. The local model is combined with a nonlocal model that involves a small number of energy terms according to a physical force field, and some information on the overall secondary structure content. In this initial study we focus on the formulation of the joint model and the evaluation of the use of an energy vector as a descriptor of a protein's nonlocal structure; hence, we derive the parameters of the nonlocal model from the native structure without loss of generality. The local and nonlocal models are combined using the reference ratio method, which is a well-justified probabilistic construction. For evaluation, we use the resulting joint models to predict the structure of four proteins. The results indicate that the proposed method and the probabilistic models show considerable promise for probabilistic protein structure prediction and related applications.


Asunto(s)
Modelos Moleculares , Modelos Estadísticos , Algoritmos , Secuencia de Aminoácidos , Proteínas Bacterianas/química , Teorema de Bayes , Enlace de Hidrógeno , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Homología Estructural de Proteína , Termodinámica
12.
PLoS One ; 8(11): e79439, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24244505

RESUMEN

We present the theoretical foundations of a general principle to infer structure ensembles of flexible biomolecules from spatially and temporally averaged data obtained in biophysical experiments. The central idea is to compute the Kullback-Leibler optimal modification of a given prior distribution τ(x) with respect to the experimental data and its uncertainty. This principle generalizes the successful inferential structure determination method and recently proposed maximum entropy methods. Tractability of the protocol is demonstrated through the analysis of simulated nuclear magnetic resonance spectroscopy data of a small peptide.


Asunto(s)
Biofisica , Modelos Teóricos , Algoritmos , Simulación por Computador
13.
J Comput Chem ; 34(19): 1697-705, 2013 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-23619610

RESUMEN

We present a new software framework for Markov chain Monte Carlo sampling for simulation, prediction, and inference of protein structure. The software package contains implementations of recent advances in Monte Carlo methodology, such as efficient local updates and sampling from probabilistic models of local protein structure. These models form a probabilistic alternative to the widely used fragment and rotamer libraries. Combined with an easily extendible software architecture, this makes PHAISTOS well suited for Bayesian inference of protein structure from sequence and/or experimental data. Currently, two force-fields are available within the framework: PROFASI and OPLS-AA/L, the latter including the generalized Born surface area solvent model. A flexible command-line and configuration-file interface allows users quickly to set up simulations with the desired configuration. PHAISTOS is released under the GNU General Public License v3.0. Source code and documentation are freely available from http://phaistos.sourceforge.net. The software is implemented in C++ and has been tested on Linux and OSX platforms.


Asunto(s)
Cadenas de Markov , Método de Montecarlo , Proteínas/química , Programas Informáticos , Teorema de Bayes , Simulación por Computador , Modelos Químicos , Conformación Proteica
14.
Proteins ; 81(8): 1340-50, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23468247

RESUMEN

Protein structure prediction methods typically use statistical potentials, which rely on statistics derived from a database of know protein structures. In the vast majority of cases, these potentials involve pairwise distances or contacts between amino acids or atoms. Although some potentials beyond pairwise interactions have been described, the formulation of a general multibody potential is seen as intractable due to the perceived limited amount of data. In this article, we show that it is possible to formulate a probabilistic model of higher order interactions in proteins, without arbitrarily limiting the number of contacts. The success of this approach is based on replacing a naive table-based approach with a simple hierarchical model involving suitable probability distributions and conditional independence assumptions. The model captures the joint probability distribution of an amino acid and its neighbors, local structure and solvent exposure. We show that this model can be used to approximate the conditional probability distribution of an amino acid sequence given a structure using a pseudo-likelihood approach. We verify the model by decoy recognition and site-specific amino acid predictions. Our coarse-grained model is compared to state-of-art methods that use full atomic detail. This article illustrates how the use of simple probabilistic models can lead to new opportunities in the treatment of nonlocal interactions in knowledge-based protein structure prediction and design.


Asunto(s)
Proteínas/metabolismo , Teorema de Bayes , Humanos , Modelos Biológicos , Modelos Moleculares , Modelos Estadísticos , Conformación Proteica , Mapas de Interacción de Proteínas , Proteínas/química , Ubiquitina/química , Ubiquitina/metabolismo
15.
PLoS One ; 8(12): e84123, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24391900

RESUMEN

We present the ProCS method for the rapid and accurate prediction of protein backbone amide proton chemical shifts--sensitive probes of the geometry of key hydrogen bonds that determine protein structure. ProCS is parameterized against quantum mechanical (QM) calculations and reproduces high level QM results obtained for a small protein with an RMSD of 0.25 ppm (r = 0.94). ProCS is interfaced with the PHAISTOS protein simulation program and is used to infer statistical protein ensembles that reflect experimentally measured amide proton chemical shift values. Such chemical shift-based structural refinements, starting from high-resolution X-ray structures of Protein G, ubiquitin, and SMN Tudor Domain, result in average chemical shifts, hydrogen bond geometries, and trans-hydrogen bond ((h3)J(NC')) spin-spin coupling constants that are in excellent agreement with experiment. We show that the structural sensitivity of the QM-based amide proton chemical shift predictions is needed to obtain this agreement. The ProCS method thus offers a powerful new tool for refining the structures of hydrogen bonding networks to high accuracy with many potential applications such as protein flexibility in ligand binding.


Asunto(s)
Amidas/química , Proteínas Bacterianas/química , Resonancia Magnética Nuclear Biomolecular , Protones , Teoría Cuántica , Proteínas del Complejo SMN/química , Ubiquitina/química , Cristalografía por Rayos X , Humanos , Enlace de Hidrógeno , Imagen por Resonancia Magnética , Modelos Moleculares , Simulación de Dinámica Molecular , Método de Montecarlo , Conformación Proteica
16.
Structure ; 20(6): 1028-39, 2012 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-22578545

RESUMEN

Protein dynamics play a crucial role in function, catalytic activity, and pathogenesis. Consequently, there is great interest in computational methods that probe the conformational fluctuations of a protein. However, molecular dynamics simulations are computationally costly and therefore are often limited to comparatively short timescales. TYPHON is a probabilistic method to explore the conformational space of proteins under the guidance of a sophisticated probabilistic model of local structure and a given set of restraints that represent nonlocal interactions, such as hydrogen bonds or disulfide bridges. The choice of the restraints themselves is heuristic, but the resulting probabilistic model is well-defined and rigorous. Conceptually, TYPHON constitutes a null model of conformational fluctuations under a given set of restraints. We demonstrate that TYPHON can provide information on conformational fluctuations that is in correspondence with experimental measurements. TYPHON provides a flexible, yet computationally efficient, method to explore possible conformational fluctuations in proteins.


Asunto(s)
Simulación por Computador , Modelos Moleculares , Programas Informáticos , Algoritmos , Secuencias de Aminoácidos , Animales , Bovinos , Cistina/química , Proteínas Fúngicas/química , Humanos , Enlace de Hidrógeno , Lipasa/química , Modelos Estadísticos , Estructura Terciaria de Proteína , Proteínas Proto-Oncogénicas/química , Ribonucleasa Pancreática/química , Superóxido Dismutasa/química , Ubiquitina/química
17.
J Chem Theory Comput ; 8(2): 695-702, 2012 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-26596617

RESUMEN

Although Markov chain Monte Carlo (MC) simulation is a potentially powerful approach for exploring conformational space, it has been unable to compete with molecular dynamics (MD) in the analysis of high density structural states, such as the native state of globular proteins. Here, we introduce a kinetic algorithm, CRISP, that greatly enhances the sampling efficiency in all-atom MC simulations of dense systems. The algorithm is based on an exact analytical solution to the classic chain-closure problem, making it possible to express the interdependencies among degrees of freedom in the molecule as correlations in a multivariate Gaussian distribution. We demonstrate that our method reproduces structural variation in proteins with greater efficiency than current state-of-the-art Monte Carlo methods and has real-time simulation performance on par with molecular dynamics simulations. The presented results suggest our method as a valuable tool in the study of molecules in atomic detail, offering a potential alternative to molecular dynamics for probing long time-scale conformational transitions.

18.
Bioinformatics ; 28(4): 510-5, 2012 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-22199383

RESUMEN

MOTIVATION: Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. RESULTS: We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors--which were introduced by Røgen and co-workers--and subsequently performing K-means clustering. CONCLUSIONS: Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50,000 structures, can be clustered within seconds to minutes.


Asunto(s)
Análisis por Conglomerados , Biología Computacional/métodos , Proteínas/química , Adenilato Quinasa/química , Candida/química , Escherichia coli/enzimología , Proteínas Fúngicas/química , Simulación de Dinámica Molecular
19.
J Magn Reson ; 213(1): 182-6, 2011 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-21993764

RESUMEN

Conventional methods for protein structure determination from NMR data rely on the ad hoc combination of physical forcefields and experimental data, along with heuristic determination of free parameters such as weight of experimental data relative to a physical forcefield. Recently, a theoretically rigorous approach was developed which treats structure determination as a problem of Bayesian inference. In this case, the forcefields are brought in as a prior distribution in the form of a Boltzmann factor. Due to high computational cost, the approach has been only sparsely applied in practice. Here, we demonstrate that the use of generative probabilistic models instead of physical forcefields in the Bayesian formalism is not only conceptually attractive, but also improves precision and efficiency. Our results open new vistas for the use of sophisticated probabilistic models of biomolecular structure in structure determination from experimental data.


Asunto(s)
Modelos Estadísticos , Resonancia Magnética Nuclear Biomolecular/métodos , Conformación Proteica , Proteínas/química , Algoritmos , Teorema de Bayes , Campos Electromagnéticos , Modelos Moleculares , Estructura Terciaria de Proteína , Temperatura
20.
PLoS One ; 5(11): e13714, 2010 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-21103041

RESUMEN

Understanding protein structure is of crucial importance in science, medicine and biotechnology. For about two decades, knowledge-based potentials based on pairwise distances--so-called "potentials of mean force" (PMFs)--have been center stage in the prediction and design of protein structure and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state--a necessary component of these potentials--is an unsolved problem. PMFs are loosely justified by analogy to the reversible work theorem in statistical physics, or by a statistical argument based on a likelihood function. Both justifications are insightful but leave many questions unanswered. Here, we show for the first time that PMFs can be seen as approximations to quantities that do have a rigorous probabilistic justification: they naturally arise when probability distributions over different features of proteins need to be combined. We call these quantities "reference ratio distributions" deriving from the application of the "reference ratio method." This new view is not only of theoretical relevance but leads to many insights that are of direct practical use: the reference state is uniquely defined and does not require external physical insights; the approach can be generalized beyond pairwise distances to arbitrary features of protein structure; and it becomes clear for which purposes the use of these quantities is justified. We illustrate these insights with two applications, involving the radius of gyration and hydrogen bonding. In the latter case, we also show how the reference ratio method can be iteratively applied to sculpt an energy funnel. Our results considerably increase the understanding and scope of energy functions derived from known biomolecular structures.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Conformación Proteica , Pliegue de Proteína , Enlace de Hidrógeno , Modelos Moleculares , Reproducibilidad de los Resultados , Termodinámica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA