Búsqueda | BVS CLAP/SMR-OPS/OMS

Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction.

Vorberg, Susann; Seemayer, Stefan; Söding, Johannes.

PLoS Comput Biol ; 14(11): e1006526, 2018 11.

Artículo en Inglés | MEDLINE | ID: mdl-30395601

RESUMEN

Compensatory mutations between protein residues in physical contact can manifest themselves as statistical couplings between the corresponding columns in a multiple sequence alignment (MSA) of the protein family. Conversely, large coupling coefficients predict residue contacts. Methods for de-novo protein structure prediction based on this approach are becoming increasingly reliable. Their main limitation is the strong systematic and statistical noise in the estimation of coupling coefficients, which has so far limited their application to very large protein families. While most research has focused on improving predictions by adding external information, little progress has been made to improve the statistical procedure at the core, because our lack of understanding of the sources of noise poses a major obstacle. First, we show theoretically that the expectation value of the coupling score assuming no coupling is proportional to the product of the square roots of the column entropies, and we propose a simple entropy bias correction (EntC) that subtracts out this expectation value. Second, we show that the average product correction (APC) includes the correction of the entropy bias, partly explaining its success. Third, we have developed CCMgen, the first method for simulating protein evolution and generating realistic synthetic MSAs with pairwise statistical residue couplings. Fourth, to learn exact statistical models that reliably reproduce observed alignment statistics, we developed CCMpredPy, an implementation of the persistent contrastive divergence (PCD) method for exact inference. Fifth, we demonstrate how CCMgen and CCMpredPy can facilitate the development of contact prediction methods by analysing the systematic noise contributions from phylogeny and entropy. Using the entropy bias correction, we can disentangle both sources of noise and find that entropy contributes roughly twice as much noise as phylogeny.

Asunto(s)

Proteínas/química , Alineación de Secuencia , Algoritmos , Secuencia de Aminoácidos , Sitios de Unión , Entropía , Ruido , Homología de Secuencia de Aminoácido

A large-scale evaluation of computational protein function prediction.

Radivojac, Predrag; Clark, Wyatt T; Oron, Tal Ronnen; Schnoes, Alexandra M; Wittkop, Tobias; Sokolov, Artem; Graim, Kiley; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa; Pandey, Gaurav; Yunes, Jeffrey M; Talwalkar, Ameet S; Repo, Susanna; Souza, Michael L; Piovesan, Damiano; Casadio, Rita; Wang, Zheng; Cheng, Jianlin; Fang, Hai; Gough, Julian; Koskinen, Patrik; Törönen, Petri; Nokso-Koivisto, Jussi; Holm, Liisa; Cozzetto, Domenico; Buchan, Daniel W A; Bryson, Kevin; Jones, David T; Limaye, Bhakti; Inamdar, Harshal; Datta, Avik; Manjari, Sunitha K; Joshi, Rajendra; Chitale, Meghana; Kihara, Daisuke; Lisewski, Andreas M; Erdin, Serkan; Venner, Eric; Lichtarge, Olivier; Rentzsch, Robert; Yang, Haixuan; Romero, Alfonso E; Bhat, Prajwal; Paccanaro, Alberto; Hamp, Tobias; Kaßner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian.

Nat Methods ; 10(3): 221-7, 2013 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-23353650

RESUMEN

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.

Asunto(s)

Biología Computacional/métodos , Biología Molecular/métodos , Anotación de Secuencia Molecular , Proteínas/fisiología , Algoritmos , Animales , Bases de Datos de Proteínas , Exorribonucleasas/clasificación , Exorribonucleasas/genética , Exorribonucleasas/fisiología , Predicción , Humanos , Proteínas/química , Proteínas/clasificación , Proteínas/genética , Especificidad de la Especie

CCMpred--fast and precise prediction of protein residue-residue contacts from correlated mutations.

Seemayer, Stefan; Gruber, Markus; Söding, Johannes.

Bioinformatics ; 30(21): 3128-30, 2014 Nov 01.

Artículo en Inglés | MEDLINE | ID: mdl-25064567

RESUMEN

MOTIVATION: Recent breakthroughs in protein residue-residue contact prediction have made reliable de novo prediction of protein structures possible. The key was to apply statistical methods that can distinguish direct couplings between pairs of columns in a multiple sequence alignment from merely correlated pairs, i.e. to separate direct from indirect effects. Two classes of such methods exist, either relying on regularized inversion of the covariance matrix or on pseudo-likelihood maximization (PLM). Although PLM-based methods offer clearly higher precision, available tools are not sufficiently optimized and are written in interpreted languages that introduce additional overheads. This impedes the runtime and large-scale contact prediction for larger protein families, multi-domain proteins and protein-protein interactions. RESULTS: Here we introduce CCMpred, our performance-optimized PLM implementation in C and CUDA C. Using graphics cards in the price range of current six-core processors, CCMpred can predict contacts for typical alignments 35-113 times faster and with the same precision as the most accurate published methods. For users without a CUDA-capable graphics card, CCMpred can also run in a CPU mode that is still 4-14 times faster. Thanks to our speed-ups (http://dictionary.cambridge.org/dictionary/british/speed-up) contacts for typical protein families can be predicted in 15-60 s on a consumer-grade GPU and 1-6 min on a six-core CPU. AVAILABILITY AND IMPLEMENTATION: CCMpred is free and open-source software under the GNU Affero General Public License v3 (or later) available at https://bitbucket.org/soedinglab/ccmpred.

Asunto(s)

Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína , Programas Informáticos , Mutación , Proteínas/clasificación , Proteínas/genética , Homología de Secuencia de Aminoácido

Solvent concentration at 50% protein unfolding may reform enzyme stability ranking and process window identification.

Sorgenfrei, Frieda A; Sloan, Jeremy J; Weissensteiner, Florian; Zechner, Marco; Mehner, Niklas A; Ellinghaus, Thomas L; Schachtschabel, Doreen; Seemayer, Stefan; Kroutil, Wolfgang.

Nat Commun ; 15(1): 5420, 2024 Jun 26.

Artículo en Inglés | MEDLINE | ID: mdl-38926341

RESUMEN

As water miscible organic co-solvents are often required for enzyme reactions to improve e.g., the solubility of the substrate in the aqueous medium, an enzyme is required which displays high stability in the presence of this co-solvent. Consequently, it is of utmost importance to identify the most suitable enzyme or the appropriate reaction conditions. Until now, the melting temperature is used in general as a measure for stability of enzymes. The experiments here show, that the melting temperature does not correlate to the activity observed in the presence of the solvent. As an alternative parameter, the concentration of the co-solvent at the point of 50% protein unfolding at a specific temperature T in short c U 50 T is introduced. Analyzing a set of ene reductases, c U 50 T is shown to indicate the concentration of the co-solvent where also the activity of the enzyme drops fastest. Comparing possible rankings of enzymes according to melting temperature and c U 50 T reveals a clearly diverging outcome also depending on the specific solvent used. Additionally, plots of c U 50 versus temperature enable a fast identification of possible reaction windows to deduce tolerated solvent concentrations and temperature.

Asunto(s)

Estabilidad de Enzimas , Desplegamiento Proteico , Solventes , Solventes/química , Temperatura , Temperatura de Transición , Oxidorreductasas/química , Oxidorreductasas/metabolismo

Homology-based inference sets the bar high for protein function prediction.

Hamp, Tobias; Kassner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian; Achten, Dominik; Auer, Florian; Boehm, Ariane; Braun, Tatjana; Hecht, Maximilian; Heron, Mark; Hönigschmid, Peter; Hopf, Thomas A; Kaufmann, Stefanie; Kiening, Michael; Krompass, Denis; Landerer, Cedric; Mahlich, Yannick; Roos, Manfred; Rost, Burkhard.

BMC Bioinformatics ; 14 Suppl 3: S7, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-23514582

RESUMEN

BACKGROUND: Any method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference. METHODS: Here, we describe a few methods that predict protein function exclusively through homology. Together, they set the bar or lower limit for future improvements. RESULTS AND CONCLUSIONS: During the development of these methods, we faced two surprises. Firstly, our most successful implementation for the baseline ranked very high at CAFA1. In fact, our best combination of homology-based methods fared only slightly worse than the top-of-the-line prediction method from the Jones group. Secondly, although the concept of homology-based inference is simple, this work revealed that the precise details of the implementation are crucial: not only did the methods span from top to bottom performers at CAFA, but also the reasons for these differences were unexpected. In this work, we also propose a new rigorous measure to compare predicted and experimental annotations. It puts more emphasis on the details of protein function than the other measures employed by CAFA and may best reflect the expectations of users. Clearly, the definition of proper goals remains one major objective for CAFA.

Asunto(s)

Proteínas/fisiología , Homología de Secuencia de Aminoácido , Algoritmos , Proteínas/genética

A structural model of the active ribosome-bound membrane protein insertase YidC.

Wickles, Stephan; Singharoy, Abhishek; Andreani, Jessica; Seemayer, Stefan; Bischoff, Lukas; Berninghausen, Otto; Soeding, Johannes; Schulten, Klaus; van der Sluis, Eli O; Beckmann, Roland.

Elife ; 3: e03035, 2014 Jul 10.

Artículo en Inglés | MEDLINE | ID: mdl-25012291

RESUMEN

The integration of most membrane proteins into the cytoplasmic membrane of bacteria occurs co-translationally. The universally conserved YidC protein mediates this process either individually as a membrane protein insertase, or in concert with the SecY complex. Here, we present a structural model of YidC based on evolutionary co-variation analysis, lipid-versus-protein-exposure and molecular dynamics simulations. The model suggests a distinctive arrangement of the conserved five transmembrane domains and a helical hairpin between transmembrane segment 2 (TM2) and TM3 on the cytoplasmic membrane surface. The model was used for docking into a cryo-electron microscopy reconstruction of a translating YidC-ribosome complex carrying the YidC substrate FOc. This structure reveals how a single copy of YidC interacts with the ribosome at the ribosomal tunnel exit and identifies a site for membrane protein insertion at the YidC protein-lipid interface. Together, these data suggest a mechanism for the co-translational mode of YidC-mediated membrane protein insertion.

Asunto(s)

Membrana Celular/química , Proteínas de Escherichia coli/química , Escherichia coli/química , Regulación Bacteriana de la Expresión Génica , Proteínas de Transporte de Membrana/química , Ribosomas/química , Secuencia de Aminoácidos , Membrana Celular/genética , Membrana Celular/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Enlace de Hidrógeno , Cinética , Lípidos/química , Proteínas de Transporte de Membrana/genética , Proteínas de Transporte de Membrana/metabolismo , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Datos de Secuencia Molecular , Biosíntesis de Proteínas , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Ribosomas/metabolismo , Canales de Translocación SEC , Alineación de Secuencia , Termodinámica

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA