Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Bioinformatics ; 34(2): 323-329, 2018 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-28968857

RESUMEN

The Quest for Orthologs (QfO) is an open collaboration framework for experts in comparative phylogenomics and related research areas who have an interest in highly accurate orthology predictions and their applications. We here report highlights and discussion points from the QfO meeting 2015 held in Barcelona. Achievements in recent years have established a basis to support developments for improved orthology prediction and to explore new approaches. Central to the QfO effort is proper benchmarking of methods and services, as well as design of standardized datasets and standardized formats to allow sharing and comparison of results. Simultaneously, analysis pipelines have been improved, evaluated and adapted to handle large datasets. All this would not have occurred without the long-term collaboration of Consortium members. Meeting regularly to review and coordinate complementary activities from a broad spectrum of innovative researchers clearly benefits the community. Highlights of the meeting include addressing sources of and legitimacy of disagreements between orthology calls, the context dependency of orthology definitions, special challenges encountered when analyzing very anciently rooted orthologies, orthology in the light of whole-genome duplications, and the concept of orthologous versus paralogous relationships at different levels, including domain-level orthology. Furthermore, particular needs for different applications (e.g. plant genomics, ancient gene families and others) and the infrastructure for making orthology inferences available (e.g. interfaces with model organism databases) were discussed, with several ongoing efforts that are expected to be reported on during the upcoming 2017 QfO meeting.

2.
Nat Methods ; 13(5): 425-30, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-27043882

RESUMEN

Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.


Asunto(s)
Biología Computacional/normas , Genómica/normas , Filogenia , Proteómica/normas , Archaea/clasificación , Archaea/genética , Bacterias/clasificación , Bacterias/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Eucariontes/clasificación , Eucariontes/genética , Ontología de Genes , Genómica/métodos , Modelos Genéticos , Proteómica/métodos , Análisis de Secuencia de Proteína , Homología de Secuencia , Especificidad de la Especie
3.
Genome Biol Evol ; 7(7): 1988-99, 2015 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-26133389

RESUMEN

Quest for Orthologs (QfO) is a community effort with the goal to improve and benchmark orthology predictions. As quality assessment assumes prior knowledge on species phylogenies, we investigated the congruency between existing species trees by comparing the relationships of 147 QfO reference organisms from six Tree of Life (ToL)/species tree projects: The National Center for Biotechnology Information (NCBI) taxonomy, Opentree of Life, the sequenced species/species ToL, the 16S ribosomal RNA (rRNA) database, and trees published by Ciccarelli et al. (Ciccarelli FD, et al. 2006. Toward automatic reconstruction of a highly resolved tree of life. Science 311:1283-1287) and by Huerta-Cepas et al. (Huerta-Cepas J, Marcet-Houben M, Gabaldon T. 2014. A nested phylogenetic reconstruction approach provides scalable resolution in the eukaryotic Tree Of Life. PeerJ PrePrints 2:223) Our study reveals that each species tree suggests a different phylogeny: 87 of the 146 (60%) possible splits of a dichotomous and rooted tree are congruent, while all other splits are incongruent in at least one of the species trees. Topological differences are observed not only at deep speciation events, but also within younger clades, such as Hominidae, Rodentia, Laurasiatheria, or rosids. The evolutionary relationships of 27 archaea and bacteria are highly inconsistent. By assessing 458,108 gene trees from 65 genomes, we show that consistent species topologies are more often supported by gene phylogenies than contradicting ones. The largest concordant species tree includes 77 of the QfO reference organisms at the most. Results are summarized in the form of a consensus ToL (http://swisstree.vital-it.ch/species_tree) that can serve different benchmarking purposes.


Asunto(s)
Filogenia , Archaea/clasificación , Archaea/genética , Bacterias/clasificación , Bacterias/genética , Eucariontes/clasificación , Eucariontes/genética , Genes
4.
Bioinformatics ; 30(21): 2993-8, 2014 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-25064571

RESUMEN

UNLABELLED: Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking. AVAILABILITY AND IMPLEMENTATION: All such materials are available at http://questfororthologs.org.


Asunto(s)
Genómica/métodos , Homología de Secuencia , Algoritmos , Estructura Terciaria de Proteína , Proteoma , Análisis de Secuencia de ADN , Análisis de Secuencia de Proteína
5.
Plant Physiol ; 165(4): 1709-1722, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24920445

RESUMEN

CASPARIAN STRIP MEMBRANE DOMAIN PROTEINS (CASPs) are four-membrane-span proteins that mediate the deposition of Casparian strips in the endodermis by recruiting the lignin polymerization machinery. CASPs show high stability in their membrane domain, which presents all the hallmarks of a membrane scaffold. Here, we characterized the large family of CASP-like (CASPL) proteins. CASPLs were found in all major divisions of land plants as well as in green algae; homologs outside of the plant kingdom were identified as members of the MARVEL protein family. When ectopically expressed in the endodermis, most CASPLs were able to integrate the CASP membrane domain, which suggests that CASPLs share with CASPs the propensity to form transmembrane scaffolds. Extracellular loops are not necessary for generating the scaffold, since CASP1 was still able to localize correctly when either one of the extracellular loops was deleted. The CASP first extracellular loop was found conserved in euphyllophytes but absent in plants lacking Casparian strips, an observation that may contribute to the study of Casparian strip and root evolution. In Arabidopsis (Arabidopsis thaliana), CASPL showed specific expression in a variety of cell types, such as trichomes, abscission zone cells, peripheral root cap cells, and xylem pole pericycle cells.

6.
PLoS One ; 8(3): e58126, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23505460

RESUMEN

A heme-containing transmembrane ferric reductase domain (FRD) is found in bacterial and eukaryotic protein families, including ferric reductases (FRE), and NADPH oxidases (NOX). The aim of this study was to understand the phylogeny of the FRD superfamily. Bacteria contain FRD proteins consisting only of the ferric reductase domain, such as YedZ and short bFRE proteins. Full length FRE and NOX enzymes are mostly found in eukaryotic cells and all possess a dehydrogenase domain, allowing them to catalyze electron transfer from cytosolic NADPH to extracellular metal ions (FRE) or oxygen (NOX). Metazoa possess YedZ-related STEAP proteins, possibly derived from bacteria through horizontal gene transfer. Phylogenetic analyses suggests that FRE enzymes appeared early in evolution, followed by a transition towards EF-hand containing NOX enzymes (NOX5- and DUOX-like). An ancestral gene of the NOX(1-4) family probably lost the EF-hands and new regulatory mechanisms of increasing complexity evolved in this clade. Two signature motifs were identified: NOX enzymes are distinguished from FRE enzymes through a four amino acid motif spanning from transmembrane domain 3 (TM3) to TM4, and YedZ/STEAP proteins are identified by the replacement of the first canonical heme-spanning histidine by a highly conserved arginine. The FRD superfamily most likely originated in bacteria.


Asunto(s)
Evolución Biológica , FMN Reductasa/química , FMN Reductasa/metabolismo , Dominios y Motivos de Interacción de Proteínas , Secuencias de Aminoácidos , Análisis por Conglomerados , Secuencia Conservada , FMN Reductasa/clasificación , FMN Reductasa/genética , Hemo/química , Hemo/metabolismo , Modelos Biológicos , Familia de Multigenes , NADH NADPH Oxidorreductasas/química , NADH NADPH Oxidorreductasas/metabolismo , Filogenia , Posición Específica de Matrices de Puntuación , Especies Reactivas de Oxígeno/metabolismo
7.
Brief Bioinform ; 12(5): 423-35, 2011 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-21737420

RESUMEN

Phylogenomic databases provide orthology predictions for species with fully sequenced genomes. Although the goal seems well-defined, the content of these databases differs greatly. Seven ortholog databases (Ensembl Compara, eggNOG, HOGENOM, InParanoid, OMA, OrthoDB, Panther) were compared on the basis of reference trees. For three well-conserved protein families, we observed a generally high specificity of orthology assignments for these databases. We show that differences in the completeness of predicted gene relationships and in the phylogenetic information are, for the great majority, not due to the methods used, but to differences in the underlying database concepts. According to our metrics, none of the databases provides a fully correct and comprehensive protein classification. Our results provide a framework for meaningful and systematic comparisons of phylogenomic databases. In the future, a sustainable set of 'Gold standard' phylogenetic trees could provide a robust method for phylogenomic databases to assess their current quality status, measure changes following new database releases and diagnose improvements subsequent to an upgrade of the analysis procedure.


Asunto(s)
Bases de Datos Genéticas , Genómica/métodos , Filogenia , Algoritmos , Evolución Molecular , Proyectos Piloto
8.
Nucleic Acids Res ; 34(11): 3309-16, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16835308

RESUMEN

Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.


Asunto(s)
Algoritmos , Bases de Datos de Proteínas , Genómica/métodos , Evolución Molecular , Filogenia , Proteínas/clasificación , Proteínas/genética , Alineación de Secuencia , Análisis de Secuencia de Proteína
9.
Nucleic Acids Res ; 34(Database issue): D187-91, 2006 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-16381842

RESUMEN

The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.


Asunto(s)
Bases de Datos de Proteínas , Internet , Proteínas/química , Proteínas/clasificación , Proteínas/fisiología , Proteoma/química , Análisis de Secuencia de Proteína , Integración de Sistemas , Interfaz Usuario-Computador
10.
C R Biol ; 328(10-11): 882-99, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16286078

RESUMEN

We all know that the dogma 'one gene, one protein' is obsolete. A functional protein and, likewise, a protein's ultimate function depend not only on the underlying genetic information but also on the ongoing conditions of the cellular system. Frequently the transcript, like the polypeptide, is processed in multiple ways, but only one or a few out of a multitude of possible variants are produced at a time. An overview on processes that can lead to sequence variety and structural diversity in eukaryotes is given. The UniProtKB/Swiss-Prot protein knowledgebase provides a wealth of information regarding protein variety, function and associated disorders. Examples for such annotation are shown and further ones are available at http://www.expasy.org/sprot/tutorial/examples_CRB.


Asunto(s)
Bases del Conocimiento , Proteínas/química , Secuencia de Aminoácidos , Datos de Secuencia Molecular , Pliegue de Proteína
11.
Nucleic Acids Res ; 33(Database issue): D154-9, 2005 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-15608167

RESUMEN

The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword list got augmented by additional keywords; we improved the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications. Furthermore, we introduced a new documentation file of the strains and their synonyms. Many new database cross-references were introduced and we started to make use of Digital Object Identifiers. We also achieved in collaboration with the Macromolecular Structure Database group at EBI an improved integration with structural databases by residue level mapping of sequences from the Protein Data Bank entries onto corresponding UniProt entries. For convenient sequence searches we provide the UniRef non-redundant sequence databases. The comprehensive UniParc database stores the complete body of publicly available protein sequence data. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published every two weeks.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Secuencia de Aminoácidos , Proteínas/fisiología , Integración de Sistemas , Interfaz Usuario-Computador
12.
Proteomics ; 4(6): 1537-50, 2004 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-15174124

RESUMEN

High-throughput proteomic studies produce a wealth of new information regarding post-translational modifications (PTMs). The Swiss-Prot knowledge base is faced with the challenge of including this information in a consistent and structured way, in order to facilitate easy retrieval and promote understanding by biologist expert users as well as computer programs. We are therefore standardizing the annotation of PTM features represented in Swiss-Prot. Indeed, a controlled vocabulary has been associated with every described PTM. In this paper, we present the major update of the feature annotation, and, by showing a few examples, explain how the annotation is implemented and what it means. Mod-Prot, a future companion database of Swiss-Prot, devoted to the biological aspects of PTMs (i.e., general description of the process, identity of the modification enzyme(s), taxonomic range, mass modification) is briefly described. Finally we encourage once again the scientific community (i.e., both individual researchers and database maintainers) to interact with us, so that we can continuously enhance the quality and swiftness of our services.


Asunto(s)
Bases de Datos de Proteínas , Procesamiento Proteico-Postraduccional , Biología Computacional , Bases de Datos de Proteínas/normas , Predicción , Sistemas de Información , Análisis de Secuencia de Proteína , Integración de Sistemas
13.
Brief Bioinform ; 5(1): 39-55, 2004 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-15153305

RESUMEN

We describe some of the aspects of Swiss-Prot that make it unique, explain what are the developments we believe to be necessary for the database to continue to play its role as a focal point of protein knowledge, and provide advice pertinent to the development of high-quality knowledge resources on one aspect or the other of the life sciences.


Asunto(s)
Bases de Datos de Proteínas , Diseño de Software , Secuencia de Aminoácidos , Animales , Bases de Datos de Proteínas/historia , Historia del Siglo XX , Historia del Siglo XXI , Humanos , Almacenamiento y Recuperación de la Información , Internet , Proteínas/clasificación , Proteínas/genética , Interfaz Usuario-Computador
14.
Nucleic Acids Res ; 32(Database issue): D115-9, 2004 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-14681372

RESUMEN

To provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information, the Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium. Our mission is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces. The central database will have two sections, corresponding to the familiar Swiss-Prot (fully manually curated entries) and TrEMBL (enriched with automated classification, annotation and extensive cross-references). For convenient sequence searches, UniProt also provides several non-redundant sequence databases. The UniProt NREF (UniRef) databases provide representative subsets of the knowledgebase suitable for efficient searching. The comprehensive UniProt Archive (UniParc) is updated daily from many public source databases. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). The scientific community is encouraged to submit data for inclusion in UniProt.


Asunto(s)
Biología Computacional , Bases de Datos de Proteínas , Proteínas/química , Proteínas/metabolismo , Animales , Humanos , Internet , Conformación Proteica , Proteínas/clasificación , Proteoma , Proteómica , Terminología como Asunto
15.
Nucleic Acids Res ; 31(1): 365-70, 2003 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-12520024

RESUMEN

The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot@expasy.org.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Animales , Proteínas Arqueales/química , Proteínas Bacterianas/química , Humanos , Almacenamiento y Recuperación de la Información , Modelos Animales , Proteínas de Plantas/química , Proteínas/clasificación , Proteoma/química , Proteómica , Integración de Sistemas , Terminología como Asunto
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA