Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Protein Sci ; 33(9): e5140, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39145441

RESUMEN

Proteins, fundamental to cellular activities, reveal their function and evolution through their structure and sequence. CATH functional families (FunFams) are coherent clusters of protein domain sequences in which the function is conserved across their members. The increasing volume and complexity of protein data enabled by large-scale repositories like MGnify or AlphaFold Database requires more powerful approaches that can scale to the size of these new resources. In this work, we introduce MARC and FRAN, two algorithms developed to build upon and address limitations of GeMMA/FunFHMMER, our original methods developed to classify proteins with related functions using a hierarchical approach. We also present CATH-eMMA, which uses embeddings or Foldseek distances to form relationship trees from distance matrices, reducing computational demands and handling various data types effectively. CATH-eMMA offers a highly robust and much faster tool for clustering protein functions on a large scale, providing a new tool for future studies in protein function and evolution.


Asunto(s)
Algoritmos , Bases de Datos de Proteínas , Proteínas , Proteínas/química , Proteínas/metabolismo , Análisis por Conglomerados , Biología Computacional/métodos , Dominios Proteicos
2.
Elife ; 122023 10 03.
Artículo en Inglés | MEDLINE | ID: mdl-37787768

RESUMEN

Many proteins remain poorly characterized even in well-studied organisms, presenting a bottleneck for research. We applied phenomics and machine-learning approaches with Schizosaccharomyces pombe for broad cues on protein functions. We assayed colony-growth phenotypes to measure the fitness of deletion mutants for 3509 non-essential genes in 131 conditions with different nutrients, drugs, and stresses. These analyses exposed phenotypes for 3492 mutants, including 124 mutants of 'priority unstudied' proteins conserved in humans, providing varied functional clues. For example, over 900 proteins were newly implicated in the resistance to oxidative stress. Phenotype-correlation networks suggested roles for poorly characterized proteins through 'guilt by association' with known proteins. For complementary functional insights, we predicted Gene Ontology (GO) terms using machine learning methods exploiting protein-network and protein-homology data (NET-FF). We obtained 56,594 high-scoring GO predictions, of which 22,060 also featured high information content. Our phenotype-correlation data and NET-FF predictions showed a strong concordance with existing PomBase GO annotations and protein networks, with integrated analyses revealing 1675 novel GO predictions for 783 genes, including 47 predictions for 23 priority unstudied proteins. Experimental validation identified new proteins involved in cellular aging, showing that these predictions and phenomics data provide a rich resource to uncover new protein functions.


Asunto(s)
Proteínas de Schizosaccharomyces pombe , Schizosaccharomyces , Humanos , Fenómica , Proteínas de Schizosaccharomyces pombe/genética , Fenotipo , Schizosaccharomyces/genética , Aprendizaje Automático
3.
Bioinformatics ; 37(8): 1099-1106, 2021 05 23.
Artículo en Inglés | MEDLINE | ID: mdl-33135053

RESUMEN

MOTIVATION: Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS: FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. AVAILABILITYAND IMPLEMENTATION: https://github.com/UCL/cath-funsite-predictor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Proteínas , Secuencia de Aminoácidos , Evolución Biológica , Humanos , Proteínas/genética
4.
Nucleic Acids Res ; 49(D1): D266-D273, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33237325

RESUMEN

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.


Asunto(s)
Biología Computacional/estadística & datos numéricos , Bases de Datos de Proteínas/estadística & datos numéricos , Dominios Proteicos , Proteínas/química , Secuencia de Aminoácidos , COVID-19/epidemiología , COVID-19/prevención & control , COVID-19/virología , Biología Computacional/métodos , Epidemias , Humanos , Internet , Anotación de Secuencia Molecular , Proteínas/genética , Proteínas/metabolismo , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiología , Análisis de Secuencia de Proteína/métodos , Homología de Secuencia de Aminoácido , Proteínas Virales/química , Proteínas Virales/genética , Proteínas Virales/metabolismo
5.
Sci Rep ; 10(1): 18517, 2020 10 28.
Artículo en Inglés | MEDLINE | ID: mdl-33116184

RESUMEN

Alzheimer's disease (AD), the most prevalent form of dementia, is a progressive and devastating neurodegenerative condition for which there are no effective treatments. Understanding the molecular pathology of AD during disease progression may identify new ways to reduce neuronal damage. Here, we present a longitudinal study tracking dynamic proteomic alterations in the brains of an inducible Drosophila melanogaster model of AD expressing the Arctic mutant Aß42 gene. We identified 3093 proteins from flies that were induced to express Aß42 and age-matched healthy controls using label-free quantitative ion-mobility data independent analysis mass spectrometry. Of these, 228 proteins were significantly altered by Aß42 accumulation and were enriched for AD-associated processes. Network analyses further revealed that these proteins have distinct hub and bottleneck properties in the brain protein interaction network, suggesting that several may have significant effects on brain function. Our unbiased analysis provides useful insights into the key processes governing the progression of amyloid toxicity and forms a basis for further functional analyses in model organisms and translation to mammalian systems.


Asunto(s)
Péptidos beta-Amiloides/metabolismo , Encéfalo/metabolismo , Fragmentos de Péptidos/metabolismo , Mapas de Interacción de Proteínas/fisiología , Enfermedad de Alzheimer/metabolismo , Enfermedad de Alzheimer/fisiopatología , Péptidos beta-Amiloides/fisiología , Animales , Modelos Animales de Enfermedad , Progresión de la Enfermedad , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/metabolismo , Estudios Longitudinales , Neuronas/metabolismo , Fragmentos de Péptidos/fisiología , Proteómica/métodos
6.
Nucleic Acids Res ; 47(D1): D280-D284, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30398663

RESUMEN

This article provides an update of the latest data and developments within the CATH protein structure classification database (http://www.cathdb.info). The resource provides two levels of release: CATH-B, a daily snapshot of the latest structural domain boundaries and superfamily assignments, and CATH+, which adds layers of derived data, such as predicted sequence domains, functional annotations and functional clustering (known as Functional Families or FunFams). The most recent CATH+ release (version 4.2) provides a huge update in the coverage of structural data. This release increases the number of fully- classified domains by over 40% (from 308 999 to 434 857 structural domains), corresponding to an almost two- fold increase in sequence data (from 53 million to over 95 million predicted domains) organised into 6119 superfamilies. The coverage of high-resolution, protein PDB chains that contain at least one assigned CATH domain is now 90.2% (increased from 82.3% in the previous release). A number of highly requested features have also been implemented in our web pages: allowing the user to view an alignment between their query sequence and a representative FunFam structure and providing tools that make it easier to view the full structural context (multi-domain architecture) of domains and chains.


Asunto(s)
Bases de Datos de Proteínas , Genoma , Secuencia de Aminoácidos , Animales , Secuencia Conservada , Ontología de Genes , Humanos , Modelos Moleculares , Anotación de Secuencia Molecular , Familia de Multigenes/genética , Conformación Proteica , Dominios Proteicos/genética , Alineación de Secuencia , Homología de Secuencia de Aminoácido , Relación Estructura-Actividad
7.
J Immunol ; 191(11): 5398-409, 2013 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-24146041

RESUMEN

EBV elicits primary CD8(+) T cell responses that, by T cell cloning from infectious mononucleosis (IM) patients, appear skewed toward immediate early (IE) and some early (E) lytic cycle proteins, with late (L) proteins rarely targeted. However, L Ag-specific responses have been detected regularly in polyclonal T cell cultures from long-term virus carriers. To resolve this apparent difference between responses to primary and persistent infection, 13 long-term carriers were screened in ex vivo IFN-γ ELISPOT assays using peptides spanning the two IE, six representative E, and seven representative L proteins. This revealed memory CD8 responses to 44 new lytic cycle epitopes that straddle all three protein classes but, in terms of both frequency and size, maintain the IE > E > L hierarchy of immunodominance. Having identified the HLA restriction of 10 (including 7 L) new epitopes using memory CD8(+) T cell clones, we looked in HLA-matched IM patients and found such reactivities but typically at low levels, explaining why they had gone undetected in the original IM clonal screens. Wherever tested, all CD8(+) T cell clones against these novel lytic cycle epitopes recognized lytically infected cells naturally expressing their target Ag. Surprisingly, however, clones against the most frequently recognized L Ag, the BNRF1 tegument protein, also recognized latently infected, growth-transformed cells. We infer that BNRF1 is also a latent Ag that could be targeted in T cell therapy of EBV-driven B-lymphoproliferative disease.


Asunto(s)
Linfocitos T CD8-positivos/inmunología , Herpesvirus Humano 4/inmunología , Mononucleosis Infecciosa/inmunología , Secuencia de Aminoácidos , Linfocitos T CD8-positivos/virología , Células Cultivadas , Ensayo de Immunospot Ligado a Enzimas , Antígenos HLA/metabolismo , Humanos , Epítopos Inmunodominantes/inmunología , Epítopos Inmunodominantes/metabolismo , Interferón gamma/metabolismo , Datos de Secuencia Molecular , Fragmentos de Péptidos/inmunología , Fragmentos de Péptidos/metabolismo , Unión Proteica , Proteínas del Envoltorio Viral/inmunología , Proteínas del Envoltorio Viral/metabolismo , Latencia del Virus/inmunología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA