Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Protein Sci ; 33(9): e5140, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39145441

RESUMO

Proteins, fundamental to cellular activities, reveal their function and evolution through their structure and sequence. CATH functional families (FunFams) are coherent clusters of protein domain sequences in which the function is conserved across their members. The increasing volume and complexity of protein data enabled by large-scale repositories like MGnify or AlphaFold Database requires more powerful approaches that can scale to the size of these new resources. In this work, we introduce MARC and FRAN, two algorithms developed to build upon and address limitations of GeMMA/FunFHMMER, our original methods developed to classify proteins with related functions using a hierarchical approach. We also present CATH-eMMA, which uses embeddings or Foldseek distances to form relationship trees from distance matrices, reducing computational demands and handling various data types effectively. CATH-eMMA offers a highly robust and much faster tool for clustering protein functions on a large scale, providing a new tool for future studies in protein function and evolution.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Proteínas , Proteínas/química , Proteínas/metabolismo , Análise por Conglomerados , Biologia Computacional/métodos , Domínios Proteicos
2.
Sci Rep ; 14(1): 14208, 2024 06 20.
Artigo em Inglês | MEDLINE | ID: mdl-38902252

RESUMO

The COVID-19 disease is an ongoing global health concern. Although vaccination provides some protection, people are still susceptible to re-infection. Ostensibly, certain populations or clinical groups may be more vulnerable. Factors causing these differences are unclear and whilst socioeconomic and cultural differences are likely to be important, human genetic factors could influence susceptibility. Experimental studies indicate SARS-CoV-2 uses innate immune suppression as a strategy to speed-up entry and replication into the host cell. Therefore, it is necessary to understand the impact of variants in immunity-associated human proteins on susceptibility to COVID-19. In this work, we analysed missense coding variants in several SARS-CoV-2 proteins and their human protein interactors that could enhance binding affinity to SARS-CoV-2. We curated a dataset of 19 SARS-CoV-2: human protein 3D-complexes, from the experimentally determined structures in the Protein Data Bank and models built using AlphaFold2-multimer, and analysed the impact of missense variants occurring in the protein-protein interface region. We analysed 468 missense variants from human proteins and 212 variants from SARS-CoV-2 proteins and computationally predicted their impacts on binding affinities for the human viral protein complexes. We predicted a total of 26 affinity-enhancing variants from 13 human proteins implicated in increased binding affinity to SARS-CoV-2. These include key-immunity associated genes (TOMM70, ISG15, IFIH1, IFIT2, RPS3, PALS1, NUP98, AXL, ARF6, TRIMM, TRIM25) as well as important spike receptors (KREMEN1, AXL and ACE2). We report both common (e.g., Y13N in IFIH1) and rare variants in these proteins and discuss their likely structural and functional impact, using information on known and predicted functional sites. Potential mechanisms associated with immune suppression implicated by these variants are discussed. Occurrence of certain predicted affinity-enhancing variants should be monitored as they could lead to increased susceptibility and reduced immune response to SARS-CoV-2 infection in individuals/populations carrying them. Our analyses aid in understanding the potential impact of genetic variation in immunity-associated proteins on COVID-19 susceptibility and help guide drug-repurposing strategies.


Assuntos
COVID-19 , Mutação de Sentido Incorreto , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , SARS-CoV-2/imunologia , COVID-19/genética , COVID-19/virologia , COVID-19/imunologia , Reposicionamento de Medicamentos , Proteínas Virais/genética , Proteínas Virais/metabolismo , Ligação Proteica , Predisposição Genética para Doença , Suscetibilidade a Doenças , Tratamento Farmacológico da COVID-19
3.
Bioinformatics ; 40(5)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38718225

RESUMO

MOTIVATION: Protein domains are fundamental units of protein structure and play a pivotal role in understanding folding, function, evolution, and design. The advent of accurate structure prediction techniques has resulted in an influx of new structural data, making the partitioning of these structures into domains essential for inferring evolutionary relationships and functional classification. RESULTS: This article presents Chainsaw, a supervised learning approach to domain parsing that achieves accuracy that surpasses current state-of-the-art methods. Chainsaw uses a fully convolutional neural network which is trained to predict the probability that each pair of residues is in the same domain. Domain predictions are then derived from these pairwise predictions using an algorithm that searches for the most likely assignment of residues to domains given the set of pairwise co-membership probabilities. Chainsaw matches CATH domain annotations in 78% of protein domains versus 72% for the next closest method. When predicting on AlphaFold models, expert human evaluators were twice as likely to prefer Chainsaw's predictions versus the next best method. AVAILABILITY AND IMPLEMENTATION: github.com/JudeWells/Chainsaw.


Assuntos
Algoritmos , Redes Neurais de Computação , Domínios Proteicos , Proteínas , Proteínas/química , Bases de Dados de Proteínas , Biologia Computacional/métodos , Software , Humanos
4.
J Mol Biol ; : 168551, 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38548261

RESUMO

CATH (https://www.cathdb.info) classifies domain structures from experimental protein structures in the PDB and predicted structures in the AlphaFold Database (AFDB). To cope with the scale of the predicted data a new NextFlow workflow (CATH-AlphaFlow), has been developed to classify high-quality domains into CATH superfamilies and identify novel fold groups and superfamilies. CATH-AlphaFlow uses a novel state-of-the-art structure-based domain boundary prediction method (ChainSaw) for identifying domains in multi-domain proteins. We applied CATH-AlphaFlow to process PDB structures not classified in CATH and AFDB structures from 21 model organisms, expanding CATH by over 100%. Domains not classified in existing CATH superfamilies or fold groups were used to seed novel folds, giving 253 new folds from PDB structures (September 2023 release) and 96 from AFDB structures of proteomes of 21 model organisms. Where possible, functional annotations were obtained using (i) predictions from publicly available methods (ii) annotations from structural relatives in AFDB/UniProt50. We also predicted functional sites and highly conserved residues. Some folds are associated with important functions such as photosynthetic acclimation (in flowering plants), iron permease activity (in fungi) and post-natal spermatogenesis (in mice). CATH-AlphaFlow will allow us to identify many more CATH relatives in the AFDB, further characterising the protein structure landscape.

5.
Mol Cell ; 83(22): 3950-3952, 2023 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-37977115

RESUMO

Two recent studies exploited ultra-fast structural aligners and deep-learning approaches to cluster the protein structure space in the AlphaFold Database. Barrio-Hernandez et al.1 and Durairaj et al.2 uncovered fascinating new protein functions and structural features previously unknown.


Assuntos
Análise por Conglomerados , Bases de Dados Factuais
6.
Elife ; 122023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-37787768

RESUMO

Many proteins remain poorly characterized even in well-studied organisms, presenting a bottleneck for research. We applied phenomics and machine-learning approaches with Schizosaccharomyces pombe for broad cues on protein functions. We assayed colony-growth phenotypes to measure the fitness of deletion mutants for 3509 non-essential genes in 131 conditions with different nutrients, drugs, and stresses. These analyses exposed phenotypes for 3492 mutants, including 124 mutants of 'priority unstudied' proteins conserved in humans, providing varied functional clues. For example, over 900 proteins were newly implicated in the resistance to oxidative stress. Phenotype-correlation networks suggested roles for poorly characterized proteins through 'guilt by association' with known proteins. For complementary functional insights, we predicted Gene Ontology (GO) terms using machine learning methods exploiting protein-network and protein-homology data (NET-FF). We obtained 56,594 high-scoring GO predictions, of which 22,060 also featured high information content. Our phenotype-correlation data and NET-FF predictions showed a strong concordance with existing PomBase GO annotations and protein networks, with integrated analyses revealing 1675 novel GO predictions for 783 genes, including 47 predictions for 23 priority unstudied proteins. Experimental validation identified new proteins involved in cellular aging, showing that these predictions and phenomics data provide a rich resource to uncover new protein functions.


Assuntos
Proteínas de Schizosaccharomyces pombe , Schizosaccharomyces , Humanos , Fenômica , Proteínas de Schizosaccharomyces pombe/genética , Fenótipo , Schizosaccharomyces/genética , Aprendizado de Máquina
7.
Curr Opin Struct Biol ; 79: 102543, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36807079

RESUMO

The function of proteins can often be inferred from their three-dimensional structures. Experimental structural biologists spent decades studying these structures, but the accelerated pace of protein sequencing continuously increases the gaps between sequences and structures. The early 2020s saw the advent of a new generation of deep learning-based protein structure prediction tools that offer the potential to predict structures based on any number of protein sequences. In this review, we give an overview of the impact of this new generation of structure prediction tools, with examples of the impacted field in the life sciences. We discuss the novel opportunities and new scientific and technical challenges these tools present to the broader scientific community. Finally, we highlight some potential directions for the future of computational protein structure prediction.


Assuntos
Aprendizado Profundo , Biologia Computacional/métodos , Proteínas/química , Sequência de Aminoácidos
8.
Biomolecules ; 13(2)2023 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-36830646

RESUMO

Protein kinases are important targets for treating human disorders, and they are the second most targeted families after G-protein coupled receptors. Several resources provide classification of kinases into evolutionary families (based on sequence homology); however, very few systematically classify functional families (FunFams) comprising evolutionary relatives that share similar functional properties. We have developed the FunFam-MARC (Multidomain ARchitecture-based Clustering) protocol, which uses multi-domain architectures of protein kinases and specificity-determining residues for functional family classification. FunFam-MARC predicts 2210 kinase functional families (KinFams), which have increased functional coherence, in terms of EC annotations, compared to the widely used KinBase classification. Our protocol provides a comprehensive classification for kinase sequences from >10,000 organisms. We associate human KinFams with diseases and drugs and identify 28 druggable human KinFams, i.e., enriched in clinically approved drugs. Since relatives in the same druggable KinFam tend to be structurally conserved, including the drug-binding site, these KinFams may be valuable for shortlisting therapeutic targets. Information on the human KinFams and associated 3D structures from AlphaFold2 are provided via our CATH FTP website and Zenodo. This gives the domain structure representative of each KinFam together with information on any drug compounds available. For 32% of the KinFams, we provide information on highly conserved residue sites that may be associated with specificity.


Assuntos
Proteínas Quinases , Proteínas , Humanos , Proteínas Quinases/metabolismo , Proteínas/química , Bases de Dados de Proteínas , Homologia de Sequência de Aminoácidos
9.
Commun Biol ; 6(1): 160, 2023 02 08.
Artigo em Inglês | MEDLINE | ID: mdl-36755055

RESUMO

Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.


Assuntos
Furilfuramida , Proteínas , Humanos , Bases de Dados de Proteínas , Proteínas/química
10.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36648327

RESUMO

MOTIVATION: CATH is a protein domain classification resource that exploits an automated workflow of structure and sequence comparison alongside expert manual curation to construct a hierarchical classification of evolutionary and structural relationships. The aim of this study was to develop algorithms for detecting remote homologues missed by state-of-the-art hidden Markov model (HMM)-based approaches. The method developed (CATHe) combines a neural network with sequence representations obtained from protein language models. It was assessed using a dataset of remote homologues having less than 20% sequence identity to any domain in the training set. RESULTS: The CATHe models trained on 1773 largest and 50 largest CATH superfamilies had an accuracy of 85.6 ± 0.4% and 98.2 ± 0.3%, respectively. As a further test of the power of CATHe to detect more remote homologues missed by HMMs derived from CATH domains, we used a dataset consisting of protein domains that had annotations in Pfam, but not in CATH. By using highly reliable CATHe predictions (expected error rate <0.5%), we were able to provide CATH annotations for 4.62 million Pfam domains. For a subset of these domains from Homo sapiens, we structurally validated 90.86% of the predictions by comparing their corresponding AlphaFold2 structures with structures from the CATH superfamilies to which they were assigned. AVAILABILITY AND IMPLEMENTATION: The code for the developed models is available on https://github.com/vam-sin/CATHe, and the datasets developed in this study can be accessed on https://zenodo.org/record/6327572. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Proteínas , Humanos , Homologia de Sequência de Aminoácidos , Proteínas/química , Bases de Dados de Proteínas
11.
Trends Biochem Sci ; 48(4): 345-359, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36504138

RESUMO

Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific community.


Assuntos
Aprendizado de Máquina , Proteínas , Proteínas/química , Biologia Computacional/métodos , Conformação Proteica
12.
NAR Genom Bioinform ; 4(2): lqac043, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35702380

RESUMO

Experimental structures are leveraged through multiple sequence alignments, or more generally through homology-based inference (HBI), facilitating the transfer of information from a protein with known annotation to a query without any annotation. A recent alternative expands the concept of HBI from sequence-distance lookup to embedding-based annotation transfer (EAT). These embeddings are derived from protein Language Models (pLMs). Here, we introduce using single protein representations from pLMs for contrastive learning. This learning procedure creates a new set of embeddings that optimizes constraints captured by hierarchical classifications of protein 3D structures defined by the CATH resource. The approach, dubbed ProtTucker, has an improved ability to recognize distant homologous relationships than more traditional techniques such as threading or fold recognition. Thus, these embeddings have allowed sequence comparison to step into the 'midnight zone' of protein similarity, i.e. the region in which distantly related sequences have a seemingly random pairwise sequence similarity. The novelty of this work is in the particular combination of tools and sampling techniques that ascertained good performance comparable or better to existing state-of-the-art sequence comparison methods. Additionally, since this method does not need to generate alignments it is also orders of magnitudes faster. The code is available at https://github.com/Rostlab/EAT.

13.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35641150

RESUMO

Mutations in human proteins lead to diseases. The structure of these proteins can help understand the mechanism of such diseases and develop therapeutics against them. With improved deep learning techniques, such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologs. We modeled and extracted the domains from 553 disease-associated human proteins without known protein structures or close homologs in the Protein Databank. We noticed that the model quality was higher and the Root mean square deviation (RMSD) lower between AlphaFold and RoseTTAFold models for domains that could be assigned to CATH families as compared to those which could only be assigned to Pfam families of unknown structure or could not be assigned to either. We predicted ligand-binding sites, protein-protein interfaces and conserved residues in these predicted structures. We then explored whether the disease-associated missense mutations were in the proximity of these predicted functional sites, whether they destabilized the protein structure based on ddG calculations or whether they were predicted to be pathogenic. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization or pathogenicity. When compared to polymorphisms, a larger percentage of disease-associated missense mutations were buried, closer to predicted functional sites, predicted as destabilizing and pathogenic. Usage of models from the two state-of-the-art techniques provide better confidence in our predictions, and we explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.


Assuntos
Mutação de Sentido Incorreto , Proteínas , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Mutação , Proteínas/química , Proteínas/genética
14.
STAR Protoc ; 3(1): 101029, 2022 03 18.
Artigo em Inglês | MEDLINE | ID: mdl-35059650

RESUMO

Lak megaphages are prevalent across diverse gut microbiomes and may potentially impact animal and human health through lysis of Prevotella. Given their large genome size (up to 660 kbp), Lak megaphages are difficult to culture, and their identification relies on molecular techniques. Here, we present optimized protocols for identifying Lak phages in various microbiome samples, including procedures for DNA extraction, followed by detection and quantification of genes encoding Lak structural proteins using diagnostic endpoint and SYBR green-based quantitative PCR, respectively. For complete details on the use and execution of this protocol, please refer to Crisci et al., (2021).


Assuntos
Bacteriófagos , Microbioma Gastrointestinal , Microbiota , Animais , Bacteriófagos/genética , Microbiota/genética , Prevotella/genética , Reação em Cadeia da Polimerase em Tempo Real/métodos
15.
Mol Syst Biol ; 17(9): e10079, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34519429

RESUMO

We modeled 3D structures of all SARS-CoV-2 proteins, generating 2,060 models that span 69% of the viral proteome and provide details not available elsewhere. We found that ˜6% of the proteome mimicked human proteins, while ˜7% was implicated in hijacking mechanisms that reverse post-translational modifications, block host translation, and disable host defenses; a further ˜29% self-assembled into heteromeric states that provided insight into how the viral replication and translation complex forms. To make these 3D models more accessible, we devised a structural coverage map, a novel visualization method to show what is-and is not-known about the 3D structure of the viral proteome. We integrated the coverage map into an accompanying online resource (https://aquaria.ws/covid) that can be used to find and explore models corresponding to the 79 structural states identified in this work. The resulting Aquaria-COVID resource helps scientists use emerging structural data to understand the mechanisms underlying coronavirus infection and draws attention to the 31% of the viral proteome that remains structurally unknown or dark.


Assuntos
Enzima de Conversão de Angiotensina 2/metabolismo , Interações Hospedeiro-Patógeno/genética , Processamento de Proteína Pós-Traducional , SARS-CoV-2/metabolismo , Glicoproteína da Espícula de Coronavírus/metabolismo , Sistemas de Transporte de Aminoácidos Neutros/química , Sistemas de Transporte de Aminoácidos Neutros/genética , Sistemas de Transporte de Aminoácidos Neutros/metabolismo , Enzima de Conversão de Angiotensina 2/química , Enzima de Conversão de Angiotensina 2/genética , Sítios de Ligação , COVID-19/genética , COVID-19/metabolismo , COVID-19/virologia , Biologia Computacional/métodos , Proteínas do Envelope de Coronavírus/química , Proteínas do Envelope de Coronavírus/genética , Proteínas do Envelope de Coronavírus/metabolismo , Proteínas do Nucleocapsídeo de Coronavírus/química , Proteínas do Nucleocapsídeo de Coronavírus/genética , Proteínas do Nucleocapsídeo de Coronavírus/metabolismo , Humanos , Proteínas de Transporte da Membrana Mitocondrial/química , Proteínas de Transporte da Membrana Mitocondrial/genética , Proteínas de Transporte da Membrana Mitocondrial/metabolismo , Proteínas do Complexo de Importação de Proteína Precursora Mitocondrial , Modelos Moleculares , Mimetismo Molecular , Neuropilina-1/química , Neuropilina-1/genética , Neuropilina-1/metabolismo , Fosfoproteínas/química , Fosfoproteínas/genética , Fosfoproteínas/metabolismo , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Multimerização Proteica , SARS-CoV-2/química , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus/química , Glicoproteína da Espícula de Coronavírus/genética , Proteínas da Matriz Viral/química , Proteínas da Matriz Viral/genética , Proteínas da Matriz Viral/metabolismo , Proteínas Viroporinas/química , Proteínas Viroporinas/genética , Proteínas Viroporinas/metabolismo , Replicação Viral
16.
iScience ; 24(8): 102875, 2021 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-34386733

RESUMO

Lak phages with alternatively coded ∼540 kbp genomes were recently reported to replicate in Prevotella in microbiomes of humans that consume a non-Western diet, baboons, and pigs. Here, we explore Lak phage diversity and broader distribution using diagnostic polymerase chain reaction and genome-resolved metagenomics. Lak phages were detected in 13 animal types, including reptiles, and are particularly prevalent in pigs. Tracking Lak through the pig gastrointestinal tract revealed significant enrichment in the hindgut compared to the foregut. We reconstructed 34 new Lak genomes, including six curated complete genomes, all of which are alternatively coded. An anomalously large (∼660 kbp) complete genome reconstructed for the most deeply branched Lak from a horse microbiome is also alternatively coded. From the Lak genomes, we identified proteins associated with specific animal species; notably, most have no functional predictions. The presence of closely related Lak phages in diverse animals indicates facile distribution coupled to host-specific adaptation.

17.
Front Mol Biosci ; 8: 668184, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34041266

RESUMO

This article is dedicated to the memory of Cyrus Chothia, who was a leading light in the world of protein structure evolution. His elegant analyses of protein families and their mechanisms of structural and functional evolution provided important evolutionary and biological insights and firmly established the value of structural perspectives. He was a mentor and supervisor to many other leading scientists who continued his quest to characterise structure and function space. He was also a generous and supportive colleague to those applying different approaches. In this article we review some of his accomplishments and the history of protein structure classifications, particularly SCOP and CATH. We also highlight some of the evolutionary insights these two classifications have brought. Finally, we discuss how the expansion and integration of protein sequence data into these structural families helps reveal the dark matter of function space and can inform the emergence of novel functions in Metazoa. Since we cover 25 years of structural classification, it has not been feasible to review all structure based evolutionary studies and hence we focus mainly on those undertaken by the SCOP and CATH groups and their collaborators.

18.
Bioinformatics ; 37(20): 3449-3455, 2021 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-33978744

RESUMO

MOTIVATION: Classifying proteins into functional families can improve our understanding of protein function and can allow transferring annotations within one family. For this, functional families need to be 'pure', i.e., contain only proteins with identical function. Functional Families (FunFams) cluster proteins within CATH superfamilies into such groups of proteins sharing function. 11% of all FunFams (22 830 of 203 639) contain EC annotations and of those, 7% (1526 of 22 830) have inconsistent functional annotations. RESULTS: We propose an approach to further cluster FunFams into functionally more consistent sub-families by encoding their sequences through embeddings. These embeddings originate from language models transferring knowledge gained from predicting missing amino acids in a sequence (ProtBERT) and have been further optimized to distinguish between proteins belonging to the same or a different CATH superfamily (PB-Tucker). Using distances between embeddings and DBSCAN to cluster FunFams and identify outliers, doubled the number of pure clusters per FunFam compared to random clustering. Our approach was not limited to FunFams but also succeeded on families created using sequence similarity alone. Complementing EC annotations, we observed similar results for binding annotations. Thus, we expect an increased purity also for other aspects of function. Our results can help generating FunFams; the resulting clusters with improved functional consistency allow more reliable inference of annotations. We expect this approach to succeed equally for any other grouping of proteins by their phenotypes. AVAILABILITY AND IMPLEMENTATION: Code and embeddings are available via GitHub: https://github.com/Rostlab/FunFamsClustering. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

19.
Nucleic Acids Res ; 49(D1): D266-D273, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33237325

RESUMO

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.


Assuntos
Biologia Computacional/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Domínios Proteicos , Proteínas/química , Sequência de Aminoácidos , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Biologia Computacional/métodos , Epidemias , Humanos , Internet , Anotação de Sequência Molecular , Proteínas/genética , Proteínas/metabolismo , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos , Proteínas Virais/química , Proteínas Virais/genética , Proteínas Virais/metabolismo
20.
J Cell Sci ; 133(16)2020 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-32665322

RESUMO

The yeast Hansenula polymorpha contains four members of the Pex23 family of peroxins, which characteristically contain a DysF domain. Here we show that all four H. polymorpha Pex23 family proteins localize to the endoplasmic reticulum (ER). Pex24 and Pex32, but not Pex23 and Pex29, predominantly accumulate at peroxisome-ER contacts. Upon deletion of PEX24 or PEX32 - and to a much lesser extent, of PEX23 or PEX29 - peroxisome-ER contacts are lost, concomitant with defects in peroxisomal matrix protein import, membrane growth, and organelle proliferation, positioning and segregation. These defects are suppressed by the introduction of an artificial peroxisome-ER tether, indicating that Pex24 and Pex32 contribute to tethering of peroxisomes to the ER. Accumulation of Pex32 at these contact sites is lost in cells lacking the peroxisomal membrane protein Pex11, in conjunction with disruption of the contacts. This indicates that Pex11 contributes to Pex32-dependent peroxisome-ER contact formation. The absence of Pex32 has no major effect on pre-peroxisomal vesicles that occur in pex3 atg1 deletion cells.


Assuntos
Peroxissomos , Proteínas de Saccharomyces cerevisiae , Retículo Endoplasmático/genética , Proteínas de Membrana/genética , Biogênese de Organelas , Peroxinas/genética , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomycetales
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA