Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 95
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 140(5): 744-52, 2010 Mar 05.
Artículo en Inglés | MEDLINE | ID: mdl-20211142

RESUMEN

Combinatorial interactions among transcription factors are critical to directing tissue-specific gene expression. To build a global atlas of these combinations, we have screened for physical interactions among the majority of human and mouse DNA-binding transcription factors (TFs). The complete networks contain 762 human and 877 mouse interactions. Analysis of the networks reveals that highly connected TFs are broadly expressed across tissues, and that roughly half of the measured interactions are conserved between mouse and human. The data highlight the importance of TF combinations for determining cell fate, and they lead to the identification of a SMAD3/FLI1 complex expressed during development of immunity. The availability of large TF combinatorial networks in both human and mouse will provide many opportunities to study gene regulation, tissue differentiation, and mammalian evolution.


Asunto(s)
Regulación de la Expresión Génica , Redes Reguladoras de Genes , Factores de Transcripción/metabolismo , Animales , Diferenciación Celular , Evolución Molecular , Humanos , Ratones , Monocitos/citología , Especificidad de Órganos , Proteína smad3/metabolismo , Transactivadores/metabolismo
2.
Nucleic Acids Res ; 51(D1): D418-D427, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36350672

RESUMEN

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.


Asunto(s)
Bases de Datos de Proteínas , Humanos , Secuencia de Aminoácidos , Inteligencia Artificial , Internet , Proteínas/química , Programas Informáticos
3.
Mol Cell ; 63(4): 579-592, 2016 08 18.
Artículo en Inglés | MEDLINE | ID: mdl-27540857

RESUMEN

Gene fusions are common cancer-causing mutations, but the molecular principles by which fusion protein products affect interaction networks and cause disease are not well understood. Here, we perform an integrative analysis of the structural, interactomic, and regulatory properties of thousands of putative fusion proteins. We demonstrate that genes that form fusions (i.e., parent genes) tend to be highly connected hub genes, whose protein products are enriched in structured and disordered interaction-mediating features. Fusion often results in the loss of these parental features and the depletion of regulatory sites such as post-translational modifications. Fusion products disproportionately connect proteins that did not previously interact in the protein interaction network. In this manner, fusion products can escape cellular regulation and constitutively rewire protein interaction networks. We suggest that the deregulation of central, interaction-prone proteins may represent a widespread mechanism by which fusion proteins alter the topology of cellular signaling pathways and promote cancer.


Asunto(s)
Fusión Génica , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Neoplasias/genética , Neoplasias/metabolismo , Mapas de Interacción de Proteínas , Biología Computacional , Bases de Datos de Proteínas , Humanos , Mapeo de Interacción de Proteínas , Procesamiento Proteico-Postraduccional , Transducción de Señal , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Ubiquitinación
4.
Nature ; 543(7644): 199-204, 2017 03 09.
Artículo en Inglés | MEDLINE | ID: mdl-28241135

RESUMEN

Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5' ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.


Asunto(s)
Bases de Datos Genéticas , ARN Largo no Codificante/química , ARN Largo no Codificante/genética , Transcriptoma/genética , Células Cultivadas , Secuencia Conservada/genética , Conjuntos de Datos como Asunto , Elementos de Facilitación Genéticos/genética , Epigénesis Genética , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Genómica , Humanos , Internet , Anotación de Secuencia Molecular , Especificidad de Órganos/genética , Polimorfismo de Nucleótido Simple , Regiones Promotoras Genéticas/genética , Sitios de Carácter Cuantitativo/genética , Estabilidad del ARN , ARN Mensajero/genética
5.
Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33156333

RESUMEN

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Secuencia de Aminoácidos , COVID-19/metabolismo , Internet , Anotación de Secuencia Molecular , Dominios Proteicos , Mapas de Interacción de Proteínas , SARS-CoV-2/metabolismo , Alineación de Secuencia
6.
Nucleic Acids Res ; 48(D1): D376-D382, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31724711

RESUMEN

The Structural Classification of Proteins (SCOP) database is a classification of protein domains organised according to their evolutionary and structural relationships. We report a major effort to increase the coverage of structural data, aiming to provide classification of almost all domain superfamilies with representatives in the PDB. We have also improved the database schema, provided a new API and modernised the web interface. This is by far the most significant update in coverage since SCOP 1.75 and builds on the advances in schema from the SCOP 2 prototype. The database is accessible from http://scop.mrc-lmb.cam.ac.uk.


Asunto(s)
Bases de Datos de Proteínas , Dominios Proteicos , Proteínas/química , Evolución Molecular , Internet , Proteínas/metabolismo , Programas Informáticos , Interfaz Usuario-Computador
7.
Nucleic Acids Res ; 48(D1): D314-D319, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31733063

RESUMEN

Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.


Asunto(s)
Proteínas/química , Bases de Datos de Proteínas , Proteínas/clasificación , Proteínas/genética , Interfaz Usuario-Computador
8.
Nucleic Acids Res ; 47(10): 4970-4973, 2019 06 04.
Artículo en Inglés | MEDLINE | ID: mdl-30997511

RESUMEN

The alignment between the boundaries of protein domains and the boundaries of exons could provide evidence for the evolution of proteins via domain shuffling, but literature in the field has so far struggled to conclusively show this. Here, on larger data sets than previously possible, we do finally show that this phenomenon is indisputably found widely across the eukaryotic tree. In contrast, the alignment between exons and the boundaries of intrinsically disordered regions of proteins is not a general property of eukaryotes. Most interesting of all is the discovery that domain-exon alignment is much more common in recently evolved protein sequences than older ones.


Asunto(s)
Células Eucariotas/metabolismo , Exones/genética , Intrones/genética , Proteínas/genética , Animales , Evolución Molecular , Genoma/genética , Humanos
9.
Nucleic Acids Res ; 47(D1): D490-D494, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30445555

RESUMEN

Here, we present a major update to the SUPERFAMILY database and the webserver. We describe the addition of new SUPERFAMILY 2.0 profile HMM library containing a total of 27 623 HMMs. The database now includes Superfamily domain annotations for millions of protein sequences taken from the Universal Protein Recourse Knowledgebase (UniProtKB) and the National Center for Biotechnology Information (NCBI). This addition constitutes about 51 and 45 million distinct protein sequences obtained from UniProtKB and NCBI respectively. Currently, the database contains annotations for 63 244 and 102 151 complete genomes taken from UniProtKB and NCBI respectively. The current sequence collection and genome update is the biggest so far in the history of SUPERFAMILY updates. In order to the deal with the massive wealth of information, here we introduce a new SUPERFAMILY 2.0 webserver (http://supfam.org). Currently, the webserver mainly focuses on the search, retrieval and display of Superfamily annotation for the entire sequence and genome collection in the database.


Asunto(s)
Bases de Datos de Proteínas , Dominios Proteicos , Proteoma/química , Genoma , Internet , Cadenas de Markov , Dominios Proteicos/genética , Análisis de Secuencia de Proteína
10.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30398656

RESUMEN

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Animales , Bases de Datos Genéticas , Ontología de Genes , Humanos , Internet , Familia de Multigenes , Dominios Proteicos/genética , Homología de Secuencia de Aminoácido , Programas Informáticos , Interfaz Usuario-Computador
11.
Hum Mutat ; 40(9): 1373-1391, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31322791

RESUMEN

Whole-genome sequencing (WGS) holds great potential as a diagnostic test. However, the majority of patients currently undergoing WGS lack a molecular diagnosis, largely due to the vast number of undiscovered disease genes and our inability to assess the pathogenicity of most genomic variants. The CAGI SickKids challenges attempted to address this knowledge gap by assessing state-of-the-art methods for clinical phenotype prediction from genomes. CAGI4 and CAGI5 participants were provided with WGS data and clinical descriptions of 25 and 24 undiagnosed patients from the SickKids Genome Clinic Project, respectively. Predictors were asked to identify primary and secondary causal variants. In addition, for CAGI5, groups had to match each genome to one of three disorder categories (neurologic, ophthalmologic, and connective), and separately to each patient. The performance of matching genomes to categories was no better than random but two groups performed significantly better than chance in matching genomes to patients. Two of the ten variants proposed by two groups in CAGI4 were deemed to be diagnostic, and several proposed pathogenic variants in CAGI5 are good candidates for phenotype expansion. We discuss implications for improving in silico assessment of genomic variants and identifying new disease genes.


Asunto(s)
Biología Computacional/métodos , Variación Genética , Enfermedades no Diagnosticadas/diagnóstico , Adolescente , Niño , Preescolar , Simulación por Computador , Bases de Datos Genéticas , Femenino , Predisposición Genética a la Enfermedad , Humanos , Masculino , Fenotipo , Enfermedades no Diagnosticadas/genética , Secuenciación Completa del Genoma
12.
Nucleic Acids Res ; 45(D1): D737-D743, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27794045

RESUMEN

Upon the first publication of the fifth iteration of the Functional Annotation of Mammalian Genomes collaborative project, FANTOM5, we gathered a series of primary data and database systems into the FANTOM web resource (http://fantom.gsc.riken.jp) to facilitate researchers to explore transcriptional regulation and cellular states. In the course of the collaboration, primary data and analysis results have been expanded, and functionalities of the database systems enhanced. We believe that our data and web systems are invaluable resources, and we think the scientific community will benefit for this recent update to deepen their understanding of mammalian cellular organization. We introduce the contents of FANTOM5 here, report recent updates in the web resource and provide future perspectives.


Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Genómica/métodos , Mamíferos/genética , Programas Informáticos , Navegador Web , Animales , Biología Computacional , Humanos , Motor de Búsqueda
13.
Nucleic Acids Res ; 45(D1): D190-D199, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899635

RESUMEN

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Dominios y Motivos de Interacción de Proteínas , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , Filogenia
14.
Plant Physiol ; 173(2): 1371-1390, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-27909045

RESUMEN

Of the three classes of enzymes involved in ubiquitination, ubiquitin-conjugating enzymes (E2) have been often incorrectly considered to play merely an auxiliary role in the process, and few E2 enzymes have been investigated in plants. To reveal the role of E2 in plant innate immunity, we identified and cloned 40 tomato genes encoding ubiquitin E2 proteins. Thioester assays indicated that the majority of the genes encode enzymatically active E2. Phylogenetic analysis classified the 40 tomato E2 enzymes into 13 groups, of which members of group III were found to interact and act specifically with AvrPtoB, a Pseudomonas syringae pv tomato effector that uses its ubiquitin ligase (E3) activity to suppress host immunity. Knocking down the expression of group III E2 genes in Nicotiana benthamiana diminished the AvrPtoB-promoted degradation of the Fen kinase and the AvrPtoB suppression of host immunity-associated programmed cell death. Importantly, silencing group III E2 genes also resulted in reduced pattern-triggered immunity (PTI). By contrast, programmed cell death induced by several effector-triggered immunity elicitors was not affected on group III-silenced plants. Functional characterization suggested redundancy among group III members for their role in the suppression of plant immunity by AvrPtoB and in PTI and identified UBIQUITIN-CONJUGATING11 (UBC11), UBC28, UBC29, UBC39, and UBC40 as playing a more significant role in PTI than other group III members. Our work builds a foundation for the further characterization of E2s in plant immunity and reveals that AvrPtoB has evolved a strategy for suppressing host immunity that is difficult for the plant to thwart.


Asunto(s)
Inmunidad de la Planta/fisiología , Proteínas de Plantas/inmunología , Solanum lycopersicum/genética , Enzimas Ubiquitina-Conjugadoras/inmunología , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Muerte Celular , Silenciador del Gen , Genoma de Planta , Interacciones Huésped-Patógeno/inmunología , Solanum lycopersicum/citología , Solanum lycopersicum/inmunología , Solanum lycopersicum/microbiología , Filogenia , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Plantas Modificadas Genéticamente , Proteínas Serina-Treonina Quinasas/genética , Proteínas Serina-Treonina Quinasas/metabolismo , Pseudomonas syringae/patogenicidad , Nicotiana/genética , Nicotiana/metabolismo , Enzimas Ubiquitina-Conjugadoras/genética , Enzimas Ubiquitina-Conjugadoras/metabolismo , Ubiquitinación
15.
Proc Natl Acad Sci U S A ; 112(38): 11893-8, 2015 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-26324906

RESUMEN

The most diverse marine ecosystems, coral reefs, depend upon a functional symbiosis between a cnidarian animal host (the coral) and intracellular photosynthetic dinoflagellate algae. The molecular and cellular mechanisms underlying this endosymbiosis are not well understood, in part because of the difficulties of experimental work with corals. The small sea anemone Aiptasia provides a tractable laboratory model for investigating these mechanisms. Here we report on the assembly and analysis of the Aiptasia genome, which will provide a foundation for future studies and has revealed several features that may be key to understanding the evolution and function of the endosymbiosis. These features include genomic rearrangements and taxonomically restricted genes that may be functionally related to the symbiosis, aspects of host dependence on alga-derived nutrients, a novel and expanded cnidarian-specific family of putative pattern-recognition receptors that might be involved in the animal-algal interactions, and extensive lineage-specific horizontal gene transfer. Extensive integration of genes of prokaryotic origin, including genes for antimicrobial peptides, presumably reflects an intimate association of the animal-algal pair also with its prokaryotic microbiome.


Asunto(s)
Antozoos/fisiología , Genoma/genética , Anémonas de Mar/genética , Simbiosis/genética , Animales , Cromosomas/genética , Evolución Molecular , Perfilación de la Expresión Génica , Transferencia de Gen Horizontal/genética , Tamaño del Genoma , Interacciones Microbianas/genética , Modelos Biológicos , Anotación de Secuencia Molecular , Filogenia , Secuencias Repetitivas de Ácidos Nucleicos/genética , Sintenía/genética
16.
Hum Mutat ; 38(9): 1042-1050, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-28440912

RESUMEN

Correct phenotypic interpretation of variants of unknown significance for cancer-associated genes is a diagnostic challenge as genetic screenings gain in popularity in the next-generation sequencing era. The Critical Assessment of Genome Interpretation (CAGI) experiment aims to test and define the state of the art of genotype-phenotype interpretation. Here, we present the assessment of the CAGI p16INK4a challenge. Participants were asked to predict the effect on cellular proliferation of 10 variants for the p16INK4a tumor suppressor, a cyclin-dependent kinase inhibitor encoded by the CDKN2A gene. Twenty-two pathogenicity predictors were assessed with a variety of accuracy measures for reliability in a medical context. Different assessment measures were combined in an overall ranking to provide more robust results. The R scripts used for assessment are publicly available from a GitHub repository for future use in similar assessment exercises. Despite a limited test-set size, our findings show a variety of results, with some methods performing significantly better. Methods combining different strategies frequently outperform simpler approaches. The best predictor, Yang&Zhou lab, uses a machine learning method combining an empirical energy function measuring protein stability with an evolutionary conservation term. The p16INK4a challenge highlights how subtle structural effects can neutralize otherwise deleterious variants.


Asunto(s)
Biología Computacional/métodos , Inhibidor p18 de las Quinasas Dependientes de la Ciclina/genética , Variación Genética , Línea Celular Tumoral , Proliferación Celular , Simulación por Computador , Inhibidor p16 de la Quinasa Dependiente de Ciclina , Inhibidor p18 de las Quinasas Dependientes de la Ciclina/química , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Humanos , Aprendizaje Automático , Estabilidad Proteica
17.
Hum Mutat ; 38(9): 1266-1276, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-28544481

RESUMEN

The advent of next-generation sequencing has dramatically decreased the cost for whole-genome sequencing and increased the viability for its application in research and clinical care. The Personal Genome Project (PGP) provides unrestricted access to genomes of individuals and their associated phenotypes. This resource enabled the Critical Assessment of Genome Interpretation (CAGI) to create a community challenge to assess the bioinformatics community's ability to predict traits from whole genomes. In the CAGI PGP challenge, researchers were asked to predict whether an individual had a particular trait or profile based on their whole genome. Several approaches were used to assess submissions, including ROC AUC (area under receiver operating characteristic curve), probability rankings, the number of correct predictions, and statistical significance simulations. Overall, we found that prediction of individual traits is difficult, relying on a strong knowledge of trait frequency within the general population, whereas matching genomes to trait profiles relies heavily upon a small number of common traits including ancestry, blood type, and eye color. When a rare genetic disorder is present, profiles can be matched when one or more pathogenic variants are identified. Prediction accuracy has improved substantially over the last 6 years due to improved methodology and a better understanding of features.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación Completa del Genoma/métodos , Área Bajo la Curva , Predisposición Genética a la Enfermedad , Proyecto Genoma Humano , Humanos , Fenotipo , Sitios de Carácter Cuantitativo
18.
Nucleic Acids Res ; 43(10): 4814-22, 2015 May 26.
Artículo en Inglés | MEDLINE | ID: mdl-25934802

RESUMEN

We have discovered that positions of splice junctions in genes are constrained by the tolerance for disorder-promoting amino acids in the translated protein region. It is known that efficient splicing requires nucleotide bias at the splice junction; the preferred usage produces a distribution of amino acids that is disorder-promoting. We observe that efficiency of splicing, as seen in the amino-acid distribution, is not compromised to accommodate globular structure. Thus we infer that it is the positions of splice junctions in the gene that must be under constraint by the local protein environment. Examining exonic splicing enhancers found near the splice junction in the gene, reveals that these (short DNA motifs) are more prevalent in exons that encode disordered protein regions than exons encoding structured regions. Thus we also conclude that local protein features constrain efficient splicing more in structure than in disorder.


Asunto(s)
Proteínas Intrínsecamente Desordenadas/genética , Sitios de Empalme de ARN , Aminoácidos/análisis , Animales , Eucariontes/genética , Exones , Motivos de Nucleótidos , Nucleótidos/análisis
19.
Nucleic Acids Res ; 43(Database issue): D227-33, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25414345

RESUMEN

We present updates to the SUPERFAMILY 1.75 (http://supfam.org) online resource and protein sequence collection. The hidden Markov model library that provides sequence homology to SCOP structural domains remains unchanged at version 1.75. In the last 4 years SUPERFAMILY has more than doubled its holding of curated complete proteomes over all cellular life, from 1400 proteomes reported previously in 2010 up to 3258 at present. Outside of the main sequence collection, SUPERFAMILY continues to provide domain annotation for sequences provided by other resources such as: UniProt, Ensembl, PDB, much of JGI Phytozome and selected subcollections of NCBI RefSeq. Despite this growth in data volume, SUPERFAMILY now provides users with an expanded and daily updated phylogenetic tree of life (sTOL). This tree is built with genomic-scale domain annotation data as before, but constantly updated when new species are introduced to the sequence library. Our Gene Ontology and other functional and phenotypic annotations previously reported have stood up to critical assessment by the function prediction community. We have now introduced these data in an integrated manner online at the level of an individual sequence, and--in the case of whole genomes--with enrichment analysis against a taxonomically defined background.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Ontología de Genes , Anotación de Secuencia Molecular , Filogenia , Proteínas/clasificación , Proteínas/genética , Proteoma/química , Análisis de Secuencia de Proteína
20.
Nucleic Acids Res ; 43(Database issue): D382-6, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25348407

RESUMEN

Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3D's SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Estructura Terciaria de Proteína , Algoritmos , Genómica , Internet , Modelos Moleculares , Estructura Terciaria de Proteína/genética , Análisis de Secuencia de Proteína
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA