RESUMO
Assembly of the mitochondrial respiratory chain requires the coordinated synthesis of mitochondrial and nuclear encoded subunits, redox co-factor acquisition, and correct joining of the subunits to form functional complexes. The conserved Cbp3-Cbp6 chaperone complex binds newly synthesized cytochrome b and supports the ordered acquisition of the heme co-factors. Moreover, it functions as a translational activator by interacting with the mitoribosome. Cbp3 consists of two distinct domains: an N-terminal domain present in mitochondrial Cbp3 homologs and a highly conserved C-terminal domain comprising a ubiquinol-cytochrome c chaperone region. Here, we solved the crystal structure of this C-terminal domain from a bacterial homolog at 1.4 Å resolution, revealing a unique all-helical fold. This structure allowed mapping of the interaction sites of yeast Cbp3 with Cbp6 and cytochrome b via site-specific photo-cross-linking. We propose that mitochondrial Cbp3 homologs carry an N-terminal extension that positions the conserved C-terminal domain at the ribosomal tunnel exit for an efficient interaction with its substrate, the newly synthesized cytochrome b protein.
Assuntos
Citocromos b/metabolismo , Proteínas de Membrana/metabolismo , Mitocôndrias/metabolismo , Chaperonas Moleculares/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Sequência de Aminoácidos , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Brucella abortus/metabolismo , Cristalografia por Raios X , Citocromos b/química , Citocromos b/genética , Complexo de Proteínas da Cadeia de Transporte de Elétrons/química , Complexo de Proteínas da Cadeia de Transporte de Elétrons/metabolismo , Proteínas de Membrana/química , Proteínas de Membrana/genética , Proteínas Mitocondriais/genética , Proteínas Mitocondriais/metabolismo , Chaperonas Moleculares/química , Chaperonas Moleculares/genética , Domínios Proteicos , Domínios e Motivos de Interação entre Proteínas , Estrutura Terciária de Proteína , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Alinhamento de SequênciaRESUMO
MOTIVATION: Residue contact prediction was revolutionized recently by the introduction of direct coupling analysis (DCA). Further improvements, in particular for small families, have been obtained by the combination of DCA and deep learning methods. However, existing deep learning contact prediction methods often rely on a number of external programs and are therefore computationally expensive. RESULTS: Here, we introduce a novel contact predictor, PconsC4, which performs on par with state of the art methods. PconsC4 is heavily optimized, does not use any external programs and therefore is significantly faster and easier to use than other methods. AVAILABILITY AND IMPLEMENTATION: PconsC4 is freely available under the GPL license from https://github.com/ElofssonLab/PconsC4. Installation is easy using the pip command and works on any system with Python 3.5 or later and a GCC compiler. It does not require a GPU nor special hardware. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Aprendizado Profundo , SoftwareRESUMO
MOTIVATION: Accurate contact predictions can be used for predicting the structure of proteins. Until recently these methods were limited to very big protein families, decreasing their utility. However, recent progress by combining direct coupling analysis with machine learning methods has made it possible to predict accurate contact maps for smaller families. To what extent these predictions can be used to produce accurate models of the families is not known. RESULTS: We present the PconsFold2 pipeline that uses contact predictions from PconsC3, the CONFOLD folding algorithm and model quality estimations to predict the structure of a protein. We show that the model quality estimation significantly increases the number of models that reliably can be identified. Finally, we apply PconsFold2 to 6379 Pfam families of unknown structure and find that PconsFold2 can, with an estimated 90% specificity, predict the structure of up to 558 Pfam families of unknown structure. Out of these, 415 have not been reported before. AVAILABILITY AND IMPLEMENTATION: Datasets as well as models of all the 558 Pfam families are available at http://c3.pcons.net/ . All programs used here are freely available. CONTACT: arne@bioinfo.se.
Assuntos
Biologia Computacional/métodos , Modelos Moleculares , Conformação Proteica , Software , Aprendizado de Máquina , Sensibilidade e EspecificidadeRESUMO
MOTIVATION: A few years ago it was shown that by using a maximum entropy approach to describe couplings between columns in a multiple sequence alignment it is possible to significantly increase the accuracy of residue contact predictions. For very large protein families with more than 1000 effective sequences the accuracy is sufficient to produce accurate models of proteins as well as complexes. Today, for about half of all Pfam domain families no structure is known, but unfortunately most of these families have at most a few hundred members, i.e. are too small for such contact prediction methods. RESULTS: To extend accurate contact predictions to the thousands of smaller protein families we present PconsC3, a fast and improved method for protein contact predictions that can be used for families with even 100 effective sequence members. PconsC3 outperforms direct coupling analysis (DCA) methods significantly independent on family size, secondary structure content, contact range, or the number of selected contacts. AVAILABILITY AND IMPLEMENTATION: PconsC3 is available as a web server and downloadable version at http://c3.pcons.net . The downloadable version is free for all to use and licensed under the GNU General Public License, version 2. At this site contact predictions for most Pfam families are also available. We do estimate that more than 4000 contact maps for Pfam families of unknown structure have more than 50% of the top-ranked contacts predicted correctly. CONTACT: arne@bioinfo.se. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Biologia Computacional/métodos , Estrutura Secundária de Proteína , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , SoftwareRESUMO
MOTIVATION: Recently it has been shown that the quality of protein contact prediction from evolutionary information can be improved significantly if direct and indirect information is separated. Given sufficiently large protein families, the contact predictions contain sufficient information to predict the structure of many protein families. However, since the first studies contact prediction methods have improved. Here, we ask how much the final models are improved if improved contact predictions are used. RESULTS: In a small benchmark of 15 proteins, we show that the TM-scores of top-ranked models are improved by on average 33% using PconsFold compared with the original version of EVfold. In a larger benchmark, we find that the quality is improved with 15-30% when using PconsC in comparison with earlier contact prediction methods. Further, using Rosetta instead of CNS does not significantly improve global model accuracy, but the chemistry of models generated with Rosetta is improved. AVAILABILITY: PconsFold is a fully automated pipeline for ab initio protein structure prediction based on evolutionary information. PconsFold is based on PconsC contact prediction and uses the Rosetta folding protocol. Due to its modularity, the contact prediction tool can be easily exchanged. The source code of PconsFold is available on GitHub at https://www.github.com/ElofssonLab/pcons-fold under the MIT license. PconsC is available from http://c.pcons.net/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Software , Algoritmos , Aminoácidos/química , Proteínas/química , Análise de Sequência de ProteínaRESUMO
Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for ß-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.
Assuntos
Biologia Computacional/métodos , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Análise de Sequência de Proteína/métodos , Inteligência Artificial , Bases de Dados de Proteínas , Conformação Proteica , Estrutura Secundária de ProteínaRESUMO
At present, about half of the protein domain families lack a structural representative. However, in the last decade, predicting contact maps and using these to model the tertiary structure for these protein families have become an alternative approach to gain structural insight. At present, reliable models for several hundreds of protein families have been created using this approach. To increase the use of this approach, we present PconsFam, which is an intuitive and interactive database for predicted contact maps and tertiary structure models of the entire Pfam database. By modeling all possible families, both with and without a representative structure, using the PconsFold2 pipeline, and running quality assessment estimator on the models, we predict an estimation for how confident the contact maps and structures are for each family.