Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Bioinformatics ; 32(24): 3774-3781, 2016 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-27559156

RESUMEN

MOTIVATION: By simplifying the many-bodied complexity of residue packing into patterns of simple pairwise secondary structure interactions between a single knob residue with a three-residue socket, the knob-socket construct allows a more direct incorporation of structural information into the prediction of residue contacts. By modeling the preferences between the amino acid composition of a socket and knob, we undertake an investigation of the knob-socket construct's ability to improve the prediction of residue contacts. The statistical model considers three priors and two posterior estimations to better understand how the input data affects predictions. This produces six implementations of KScons that are tested on three sets: PSICOV, CASP10 and CASP11. We compare against the current leading contact prediction methods. RESULTS: The results demonstrate the usefulness as well as the limits of knob-socket based structural modeling of protein contacts. The construct is able to extract good predictions from known structural homologs, while its performance degrades when no homologs exist. Among our six implementations, KScons MST-MP (which uses the multiple structure alignment prior and marginal posterior incorporating structural homolog information) performs the best in all three prediction sets. An analysis of recall and precision finds that KScons MST-MP improves accuracy not only by improving identification of true positives, but also by decreasing the number of false positives. Over the CASP10 and CASP11 sets, KScons MST-MP performs better than the leading methods using only evolutionary coupling data, but not quite as well as the supervised learning methods of MetaPSICOV and CoinDCA-NN that incorporate a large set of structural features. CONTACT: qiwei.li@rice.eduSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Modelos Estadísticos , Estructura Terciaria de Proteína , Proteínas/química , Algoritmos , Aminoácidos/química , Teorema de Bayes , Modelos Moleculares , Estructura Secundaria de Proteína
2.
Bioinformatics ; 26(24): 3059-66, 2010 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-21047817

RESUMEN

MOTIVATION: While protein secondary structure is well understood, representing the repetitive nature of tertiary packing in proteins remains difficult. We have developed a construct called the relative packing group (RPG) that applies the clique concept from graph theory as a natural basis for defining the packing motifs in proteins. An RPG is defined as a clique of residues, where every member contacts all others as determined by the Delaunay tessellation. Geometrically similar RPGs define a regular element of tertiary structure or tertiary motif (TerMo). This intuitive construct provides a simple approach to characterize general repetitive elements of tertiary structure. RESULTS: A dataset of over 4 million tetrahedral RPGs was clustered using different criteria to characterize the various aspects of regular tertiary structure in TerMos. Grouping this data within the SCOP classification levels of Family, Superfamily, Fold, Class and PDB showed that similar packing is shared across different folds. Classification of RPGs based on residue sequence locality reveals topological preferences according to protein sizes and secondary structure. We find that larger proteins favor RPGs with three local residues packed against a non-local residue. Classifying by secondary structure, helices prefer mostly local residues, sheets favor at least two local residues, while turns and coil populate with more local residues. To depict these TerMos, we have developed 2 complementary and intuitive representations: (i) Dirichlet process mixture density estimation of the torsion angle distributions and (ii) kernel density estimation of the Cartesian coordinate distribution. The TerMo library and representations software are available upon request.


Asunto(s)
Estructura Terciaria de Proteína , Secuencias de Aminoácidos , Modelos Moleculares , Modelos Estadísticos , Estructura Secundaria de Proteína , Proteínas/química
3.
Proteins ; 74(3): 701-11, 2009 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-18704942

RESUMEN

Protein structure prediction has a number of important ad hoc similarity measures for evaluating predictions, but would benefit from a measure that is able to provide a common framework for a broad range of comparisons. Here we show that a mutual information-like measure can provide a comprehensive framework for evaluating protein structure prediction of all types. We discuss the concept of information, its application to secondary structure, and the obstacle to applying it to 3D structure. On the basis of the insights from the secondary structure case, we present an approach to work around the 3D difficulties, and develop a method to measure the mutual information provided by a 3D structure prediction. We integrate the evaluation of all types of protein structure prediction into a single framework, and compare the amount of information provided by various prediction methods, including secondary structure prediction. Within this broadened framework, the idea that structure is better preserved than sequence during evolution is evaluated quantitatively for the globin family. A nearly perfect sequence match in the globin family corresponds to about 300 bits of information, whereas a nearly perfect structural match for the same two proteins corresponds to about 2500 bits of information, where bits of information describes the probability of obtaining a match of similar closeness by chance. Mutual information provides both a theoretical basis for evaluating structure similarity and an explanatory surround for existing similarity measures.


Asunto(s)
Teoría de la Información , Conformación Proteica , Bases de Datos de Proteínas , Entropía , Proteínas/química , Análisis de Secuencia de Proteína , Relación Estructura-Actividad
4.
J Comput Biol ; 15(1): 65-79, 2008.
Artículo en Inglés | MEDLINE | ID: mdl-18199024

RESUMEN

We describe an information-theory-based measure of the quality of secondary structure prediction (RELINFO). RELINFO has a simple yet intuitive interpretation: it represents the factor by which secondary structure choice at a residue has been restricted by a prediction scheme. As an alternative interpretation of secondary structure prediction, RELINFO complements currently used methods by providing an information-based view as to why a prediction succeeds and fails. To demonstrate this score's capabilities, we applied RELINFO to an analysis of a large set of secondary structure predictions obtained from the first five rounds of the Critical Assessment of Structure Prediction (CASP) experiment. RELINFO is compared with two other common measures: percent correct (Q3) and secondary structure overlap (SOV). While the correlation between Q3 and RELINFO is approximately 0.85, RELINFO avoids certain disadvantages of Q3, including overestimating the quality of a prediction. The correlation between SOV and RELINFO is approximately 0.75. The valuable SOV measure unfortunately suffers from a saturation problem, and perhaps has unfairly given the general impression that secondary structure prediction has reached its limit since SOV hasn't improved much over the recent rounds of CASP. Although not a replacement for SOV, RELINFO has greater dispersion. Over the five rounds of CASP assessed here, RELINFO shows that predictions targets have been more difficult in successive CASP experiments, yet the predictions quality has continued to improve measurably over each round. In terms of information, the secondary structure prediction quality has almost doubled from CASP1 to CASP5. Therefore, as a different perspective of accuracy, RELINFO can help to improve prediction of protein secondary structure by providing a measure of difficulty as well as final quality of a prediction.


Asunto(s)
Biología Computacional/métodos , Estructura Secundaria de Proteína , Proteínas/química , Algoritmos , Programas Informáticos
5.
J Am Stat Assoc ; 112(518): 721-732, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29276318

RESUMEN

We propose a random partition distribution indexed by pairwise similarity information such that partitions compatible with the similarities are given more probability. The use of pairwise similarities, in the form of distances, is common in some clustering algorithms (e.g., hierarchical clustering), but we show how to use this type of information to define a prior partition distribution for flexible Bayesian modeling. A defining feature of the distribution is that it allocates probability among partitions within a given number of subsets, but it does not shift probability among sets of partitions with different numbers of subsets. Our distribution places more probability on partitions that group similar items yet keeps the total probability of partitions with a given number of subsets constant. The distribution of the number of subsets (and its moments) is available in closed-form and is not a function of the similarities. Our formulation has an explicit probability mass function (with a tractable normalizing constant) so the full suite of MCMC methods may be used for posterior inference. We compare our distribution with several existing partition distributions, showing that our formulation has attractive properties. We provide three demonstrations to highlight the features and relative performance of our distribution.

6.
Gene ; 598: 50-62, 2017 Jan 20.
Artículo en Inglés | MEDLINE | ID: mdl-27984193

RESUMEN

The methylotrophic yeast Pichia pastoris has been used extensively for expressing recombinant proteins because it combines the ease of genetic manipulation, the ability to provide complex posttranslational modifications and the capacity for efficient protein secretion. The most successful and commonly used secretion signal leader in Pichia pastoris has been the alpha mating factor (MATα) prepro secretion signal. However, limitations exist as some proteins cannot be secreted efficiently, leading to strategies to enhance secretion efficiency by modifying the secretion signal leader. Based on a Jpred secondary structure prediction and knob-socket modeling of tertiary structure, numerous deletions and duplications of the MATα prepro leader were engineered to evaluate the correlation between predicted secondary structure and the secretion level of the reporters horseradish peroxidase (HRP) and Candida antarctica lipase B. In addition, circular dichroism analyses were completed for the wild type and several mutant pro-peptides to evaluate actual differences in secondary structure. The results lead to a new model of MATα pro-peptide signal leader, which suggests that the N and C-termini of MATα pro-peptide need to be presented in a specific orientation for proper interaction with the cellular secretion machinery and for efficient protein secretion.


Asunto(s)
Proteínas Fúngicas/genética , Factor de Apareamiento/genética , Péptidos/genética , Pichia/genética , Proteínas Recombinantes de Fusión/genética , Secuencia de Aminoácidos , Dicroismo Circular , Electroforesis en Gel de Poliacrilamida , Proteínas Fúngicas/química , Proteínas Fúngicas/metabolismo , Peroxidasa de Rábano Silvestre/genética , Peroxidasa de Rábano Silvestre/metabolismo , Lipasa/genética , Lipasa/metabolismo , Factor de Apareamiento/química , Factor de Apareamiento/metabolismo , Modelos Moleculares , Mutación , Péptidos/química , Péptidos/metabolismo , Pichia/metabolismo , Precursores de Proteínas/química , Precursores de Proteínas/genética , Precursores de Proteínas/metabolismo , Señales de Clasificación de Proteína/genética , Estructura Secundaria de Proteína , Proteínas Recombinantes de Fusión/metabolismo , Eliminación de Secuencia
7.
PLoS One ; 9(10): e109832, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25314659

RESUMEN

Determining the primary structure (i.e., amino acid sequence) of a protein has become cheaper, faster, and more accurate. Higher order protein structure provides insight into a protein's function in the cell. Understanding a protein's secondary structure is a first step towards this goal. Therefore, a number of computational prediction methods have been developed to predict secondary structure from just the primary amino acid sequence. The most successful methods use machine learning approaches that are quite accurate, but do not directly incorporate structural information. As a step towards improving secondary structure reduction given the primary structure, we propose a Bayesian model based on the knob-socket model of protein packing in secondary structure. The method considers the packing influence of residues on the secondary structure determination, including those packed close in space but distant in sequence. By performing an assessment of our method on 2 test sets we show how incorporation of multiple sequence alignment data, similarly to PSIPRED, provides balance and improves the accuracy of the predictions. Software implementing the methods is provided as a web application and a stand-alone implementation.


Asunto(s)
Modelos Moleculares , Proteínas/química , Secuencia de Aminoácidos , Teorema de Bayes , Simulación por Computador , Estructura Secundaria de Proteína , Programas Informáticos
8.
Comput Biol Chem ; 42: 40-8, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-23266765

RESUMEN

As an alternative to the common template based protein structure prediction methods based on main-chain position, a novel side-chain centric approach has been developed. Together with a Bayesian loop modeling procedure and a combination scoring function, the Stone Soup algorithm was applied to the CASP9 set of template based modeling targets. Although the method did not generate as large of perturbations to the template structures as necessary, the analysis of the results gives unique insights into the differences in packing between the target structures and their templates. Considerable variation in packing is found between target and template structures even when the structures are close, and this variation is found due to 2 and 3 body packing interactions. Outside the inherent restrictions in packing representation of the PDB, the first steps in correctly defining those regions of variable packing have been mapped primarily to local interactions, as the packing at the secondary and tertiary structure are largely conserved. Of the scoring functions used, a loop scoring function based on water structure exhibited some promise for discrimination. These results present a clear structural path for further development of a side-chain centered approach to template based modeling.


Asunto(s)
Algoritmos , Caspasa 9/química , Modelos Moleculares , Pliegue de Proteína
9.
Gene ; 519(2): 311-7, 2013 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-23454485

RESUMEN

The methylotrophic yeast, Pichia pastoris, has been genetically engineered to produce many heterologous proteins for industrial and research purposes. In order to secrete proteins for easier purification from the extracellular medium, the coding sequence of recombinant proteins is initially fused to the Saccharomyces cerevisiae α-mating factor secretion signal leader. Extensive site-directed mutagenesis of the prepro-region of the α-mating factor secretion signal sequence was performed in order to determine the effects of various deletions and substitutions on expression. Though some mutations clearly dampened protein expression, deletion of amino acids 57-70, corresponding to the predicted 3rd alpha helix of α-mating factor secretion signal, increased secretion of reporter proteins horseradish peroxidase and lipase at least 50% in small-scale cultures. These findings raise the possibility that the secretory efficiency of the leader can be further enhanced in the future.


Asunto(s)
Regulación Fúngica de la Expresión Génica , Mutación , Péptidos/metabolismo , Pichia/genética , Proteínas Recombinantes/biosíntesis , Secuencia de Aminoácidos , Western Blotting , Eliminación de Gen , Genes Reporteros , Peroxidasa de Rábano Silvestre/genética , Peroxidasa de Rábano Silvestre/metabolismo , Lipasa/genética , Lipasa/metabolismo , Factor de Apareamiento , Datos de Secuencia Molecular , Mutagénesis Sitio-Dirigida , Péptidos/genética , Pichia/metabolismo , Plásmidos , Reacción en Cadena en Tiempo Real de la Polimerasa , Proteínas Recombinantes/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
10.
Ann Appl Stat ; 4(2): 916-942, 2010 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-21031154

RESUMEN

By providing new insights into the distribution of a protein's torsion angles, recent statistical models for this data have pointed the way to more efficient methods for protein structure prediction. Most current approaches have concentrated on bivariate models at a single sequence position. There is, however, considerable value in simultaneously modeling angle pairs at multiple sequence positions in a protein. One area of application for such models is in structure prediction for the highly variable loop and turn regions. Such modeling is difficult due to the fact that the number of known protein structures available to estimate these torsion angle distributions is typically small. Furthermore, the data is "sparse" in that not all proteins have angle pairs at each sequence position. We propose a new semiparametric model for the joint distributions of angle pairs at multiple sequence positions. Our model accommodates sparse data by leveraging known information about the behavior of protein secondary structure. We demonstrate our technique by predicting the torsion angles in a loop from the globin fold family. Our results show that a template-based approach can now be successfully extended to modeling the notoriously difficult loop and turn regions.

11.
Protein Sci ; 18(1): 101-7, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19177355

RESUMEN

We examine the contribution of residues at the dimer interface of the transcriptional regulator OxyR to oligomerization. Residues in contact across the dimer interface of OxyR were identified using the program Quaternary Contacts (QContacts). Site-directed mutagenesis was performed on the non-alanine or glycine residues identified in the resultant contact profile and the oligomerization ability of the mutant proteins was tested using the lambdacI repressor system to identify residues that are hot spots in OxyR. We compared the properties of these hot spots to those described in the literature from other systems. The hot spots identified in this study are not especially conserved amongst a set of OxyR orthologs.


Asunto(s)
Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/metabolismo , Unión Proteica/fisiología , Multimerización de Proteína/fisiología , Proteínas Represoras/química , Proteínas Represoras/metabolismo , Alanina/genética , Alanina/metabolismo , Secuencia de Aminoácidos , Secuencia Conservada , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Modelos Moleculares , Mutagénesis Sitio-Dirigida , Unión Proteica/genética , Multimerización de Proteína/genética , Estructura Cuaternaria de Proteína/genética , Estructura Cuaternaria de Proteína/fisiología , Proteínas Represoras/genética , Programas Informáticos
12.
J Am Stat Assoc ; 104(486): 586-596, 2009 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-20221312

RESUMEN

Interest in predicting protein backbone conformational angles has prompted the development of modeling and inference procedures for bivariate angular distributions. We present a Bayesian approach to density estimation for bivariate angular data that uses a Dirichlet process mixture model and a bivariate von Mises distribution. We derive the necessary full conditional distributions to fit the model, as well as the details for sampling from the posterior predictive distribution. We show how our density estimation method makes it possible to improve current approaches for protein structure prediction by comparing the performance of the so-called "whole" and "half" position distributions. Current methods in the field are based on whole position distributions, as density estimation for the half positions requires techniques, such as ours, that can provide good estimates for small datasets. With our method we are able to demonstrate that half position data provides a better approximation for the distribution of conformational angles at a given sequence position, therefore providing increased efficiency and accuracy in structure prediction.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA