Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Database (Oxford) ; 20172017 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-29220476

RESUMEN

UniProt Knowledgebase (UniProtKB) is a publicly available database with access to a vast amount of protein sequence and functional information. To widen the scope of the publications associated with a protein entry, UniProt has introduced the computationally mapped additional bibliography section, which includes literature collected from external sources. In this article, we describe a text mining system, eGenPub, which selects articles that are 'about' specific proteins and allows automatic identification of additional bibliography for given UniProt protein entries. Focusing on plant proteins initially, eGenPub utilizes a gene normalization tool called pGenN, and a trained support vector machine model, which achieves a precision of 95.3%, to predict whether an article, based on its abstract, should be linked to a given UniProt entry. We have conducted a full-scale PubMed processing using eGenPub for eight common plant species. Altogether, 9025 articles are identified as relevant bibliography for 4752 UniProt entries, among which 5252 are additional papers not in the existing publication section. These newly computationally mapped additional bibliography via eGenPub is being integrated in the UniProt production pipeline, and can be accessed via the UniProtKB protein entry publication view.


Asunto(s)
Minería de Datos , Bases de Datos Bibliográficas , Bases de Datos de Proteínas , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Plantas , Plantas/genética , Plantas/metabolismo
2.
Artículo en Inglés | MEDLINE | ID: mdl-26896845

RESUMEN

Advances in high-throughput and advanced technologies allow researchers to routinely perform whole genome and proteome analysis. For this purpose, they need high-quality resources providing comprehensive gene and protein sets for their organisms of interest. Using the example of the human proteome, we will describe the content of a complete proteome in the UniProt Knowledgebase (UniProtKB). We will show how manual expert curation of UniProtKB/Swiss-Prot is complemented by expert-driven automatic annotation to build a comprehensive, high-quality and traceable resource. We will also illustrate how the complexity of the human proteome is captured and structured in UniProtKB. Database URL: www.uniprot.org.


Asunto(s)
Bases de Datos de Proteínas , Proteoma/genética , Proteómica/métodos , Automatización , Genoma , Humanos , Bases del Conocimiento , Fenotipo , Procesamiento Proteico-Postraduccional , Proteínas/química , Edición de ARN , Programas Informáticos
3.
Methods Mol Biol ; 1374: 23-54, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26519399

RESUMEN

The Universal Protein Resource (UniProt, http://www.uniprot.org ) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry. We will also present some of the tools and databases that are linked to each entry.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Animales , Humanos , Navegador Web
4.
Nucleic Acids Res ; 42(Database issue): D358-63, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24234451

RESUMEN

IntAct (freely available at http://www.ebi.ac.uk/intact) is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. IntAct has developed a sophisticated web-based curation tool, capable of supporting both IMEx- and MIMIx-level curation. This tool is now utilized by multiple additional curation teams, all of whom annotate data directly into the IntAct database. Members of the IntAct team supply appropriate levels of training, perform quality control on entries and take responsibility for long-term data maintenance. Recently, the MINT and IntAct databases decided to merge their separate efforts to make optimal use of limited developer resources and maximize the curation output. All data manually curated by the MINT curators have been moved into the IntAct database at EMBL-EBI and are merged with the existing IntAct dataset. Both IntAct and MINT are active contributors to the IMEx consortium (http://www.imexconsortium.org).


Asunto(s)
Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas , Internet , Programas Informáticos
5.
Nucleic Acids Res ; 40(Database issue): D565-70, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22123736

RESUMEN

The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins. Currently supplying over 100 million annotations to 11 million proteins in more than 360,000 taxa, this resource has increased 2-fold over the last 2 years and has benefited from a wealth of checks to improve annotation correctness and consistency as well as now supplying a greater information content enabled by GO Consortium annotation format developments. Detailed, manual GO annotations obtained from the curation of peer-reviewed papers are directly contributed by all UniProt curators and supplemented with manual and electronic annotations from 36 model organism and domain-focused scientific resources. The inclusion of high-quality, automatic annotation predictions ensures the UniProt GO annotation dataset supplies functional information to a wide range of proteins, including those from poorly characterized, non-model organism species. UniProt GO annotations are freely available in a range of formats accessible by both file downloads and web-based views. In addition, the introduction of a new, normalized file format in 2010 has made for easier handling of the complete UniProt-GOA data set.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Vocabulario Controlado , Anotación de Secuencia Molecular/normas
6.
J Proteomics ; 72(3): 567-73, 2009 Apr 13.
Artículo en Inglés | MEDLINE | ID: mdl-19084081

RESUMEN

The UniProt knowledgebase, UniProtKB, is the main product of the UniProt consortium. It consists of two sections, UniProtKB/Swiss-Prot, the manually curated section, and UniProtKB/TrEMBL, the computer translation of the EMBL/GenBank/DDBJ nucleotide sequence database. Taken together, these two sections cover all the proteins characterized or inferred from all publicly available nucleotide sequences. The Plant Proteome Annotation Program (PPAP) of UniProtKB/Swiss-Prot focuses on the manual annotation of plant-specific proteins and protein families. Our major effort is currently directed towards the two model plants Arabidopsis thaliana and Oryza sativa. In UniProtKB/Swiss-Prot, redundancy is minimized by merging all data from different sources in a single entry. The proposed protein sequence is frequently modified after comparison with ESTs, full length transcripts or homologous proteins from other species. The information present in manually curated entries allows the reconstruction of all described isoforms. The annotation also includes proteomics data such as PTM and protein identification MS experimental results. UniProtKB and the other products of the UniProt consortium are accessible online at www.uniprot.org.


Asunto(s)
Indización y Redacción de Resúmenes , Bases de Datos de Proteínas , Bases del Conocimiento , Proteínas de Plantas/análisis , Proteínas de Plantas/clasificación , Proteoma/análisis , Proteoma/clasificación , Arabidopsis/química , Internet , Espectrometría de Masas , Oryza/química , Proteínas de Plantas/química , Proteoma/química
7.
Methods Mol Biol ; 406: 89-112, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-18287689

RESUMEN

The Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI), and the Protein Information Resource (PIR) form the Universal Protein Resource (UniProt) consortium. Its main goal is to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB) and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc). (1) UniProtKB is a comprehensive protein sequence knowledgebase that consists of two sections: UniProtKB/Swiss-Prot, which contains manually annotated entries, and UniProtKB/TrEMBL, which contains computer-annotated entries. UniProtKB/Swiss-Prot entries contain information curated by biologists and provide users with cross-links to about 100 external databases and with access to additional information or tools. (2) The UniRef databases (UniRef100, UniRef90, and UniRef50) define clusters of protein sequences that share 100, 90, or 50% identity. (3) The UniParc database stores and maps all publicly available protein sequence data, including obsolete data excluded from UniProtKB. The UniProt databases can be accessed online (http://www.uniprot.org/) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published every 2 weeks. The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry, paying particular attention to the specificities of plant protein annotation. We will also present some of the tools and databases that are linked to each entry.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/genética , Secuencia de Aminoácidos , Almacenamiento y Recuperación de la Información , Datos de Secuencia Molecular , Proteínas/clasificación , Alineación de Secuencia/métodos , Homología de Secuencia de Aminoácido , Interfaz Usuario-Computador
8.
Planta ; 223(5): 965-74, 2006 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-16284776

RESUMEN

Two class III peroxidases from Arabidopsis, AtPrx33 and Atprx34, have been studied in this paper. Their encoding genes are mainly expressed in roots; AtPrx33 transcripts were also found in leaves and stems. Light activates the expression of both genes in seedlings. Transformed seedlings producing AtPrx33-GFP or AtPrx34-GFP fusion proteins under the control of the CaMV 35S promoter exhibit fluorescence in the cell walls of roots, showing that the two peroxidases are localized in the apoplast, which is in line with their affinity for the Ca(2+)-pectate structure. The role they can play in cell wall was investigated using (1) insertion mutants that have suppressed or reduced expression of AtPrx33 or AtPrx34 genes, respectively, (2) a double mutant with no AtPrx33 and a reduced level of Atprx34 transcripts, (3) a mutant overexpressing AtPrx34 under the control of the CaMV 35S promoter. The major phenotypic consequences of these genetic manipulations were observed on the variation of the length of seedling roots. Seedlings lacking AtPrx33 transcripts have shorter roots than the wild-type controls and roots are still shorter in the double mutant. Seedlings overexpressing AtPrx34 exhibit significantly longer roots. These modifications of root length are accompanied by corresponding changes of cell length. The results suggest that AtPrx33 and Atprx34, two highly homologous Arabidopsis peroxidases, are involved in the reactions that promote cell elongation and that this occurs most likely within cell walls.


Asunto(s)
Arabidopsis/enzimología , Pared Celular/enzimología , Peroxidasas/fisiología , Raíces de Plantas/crecimiento & desarrollo , Arabidopsis/genética , Arabidopsis/crecimiento & desarrollo , Aumento de la Célula , Pared Celular/fisiología , Expresión Génica , Proteínas Fluorescentes Verdes , Mutagénesis Insercional , Interferencia de ARN , Plantones/crecimiento & desarrollo
9.
Nucleic Acids Res ; 33(Database issue): D641-6, 2005 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-15608279

RESUMEN

Genomic projects heavily depend on genome annotations and are limited by the current deficiencies in the published predictions of gene structure and function. It follows that, improved annotation will allow better data mining of genomes, and more secure planning and design of experiments. The purpose of the GeneFarm project is to obtain homogeneous, reliable, documented and traceable annotations for Arabidopsis nuclear genes and gene products, and to enter them into an added-value database. This re-annotation project is being performed exhaustively on every member of each gene family. Performing a family-wide annotation makes the task easier and more efficient than a gene-by-gene approach since many features obtained for one gene can be extrapolated to some or all the other genes of a family. A complete annotation procedure based on the most efficient prediction tools available is being used by 16 partner laboratories, each contributing annotated families from its field of expertise. A database, named GeneFarm, and an associated user-friendly interface to query the annotations have been developed. More than 3000 genes distributed over 300 families have been annotated and are available at http://genoplante-info.infobiogen.fr/Genefarm/. Furthermore, collaboration with the Swiss Institute of Bioinformatics is underway to integrate the GeneFarm data into the protein knowledgebase Swiss-Prot.


Asunto(s)
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Bases de Datos Genéticas , Genes de Plantas , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/fisiología , Filosofía , Integración de Sistemas , Interfaz Usuario-Computador
10.
Plant Physiol Biochem ; 42(12): 1013-21, 2004 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-15707838

RESUMEN

The Swiss-Prot protein knowledgebase provides manually annotated entries for all species, but concentrates on the annotation of entries from model organisms to ensure the presence of high quality annotation of representative members of all protein families. A specific Plant Protein Annotation Program (PPAP) was started to cope with the increasing amount of data produced by the complete sequencing of plant genomes. Its main goal is the annotation of proteins from the model plant organism Arabidopsis thaliana. In addition to bibliographic references, experimental results, computed features and sometimes even contradictory conclusions, direct links to specialized databases connect amino acid sequences with the current knowledge in plant sciences. As protein families and groups of plant-specific proteins are regularly reviewed to keep up with current scientific findings, we hope that the wealth of information of Arabidopsis origin accumulated in our knowledgebase, and the numerous software tools provided on the Expert Protein Analysis System (ExPASy) web site might help to identify and reveal the function of proteins originating from other plants. Recently, a single, centralized, authoritative resource for protein sequences and functional information, UniProt, was created by joining the information contained in Swiss-Prot, Translation of the EMBL nucleotide sequence (TrEMBL), and the Protein Information Resource-Protein Sequence Database (PIR-PSD). A rising problem is that an increasing number of nucleotide sequences are not being submitted to the public databases, and thus the proteins inferred from such sequences will have difficulties finding their way to the Swiss-Prot or TrEMBL databases.


Asunto(s)
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Bases de Datos de Proteínas , Sistemas en Línea , Programas Informáticos , Biología Computacional/métodos , Bases de Datos de Proteínas/tendencias , Genoma de Planta , Sistemas en Línea/tendencias
11.
Gene ; 288(1-2): 129-38, 2002 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-12034502

RESUMEN

Higher plants possess a large set of the classical guaiacol peroxidases (class III peroxidases, E.C. 1.11.1.7). These enzymes have been implicated in a wide array of physiological processes such as H(2)O(2) detoxification, auxin catabolism and lignin biosynthesis and stress response (wounding, pathogen attack, etc.). During the last 10 years, molecular cloning has allowed the isolation and characterization of several genes encoding peroxidases in plants. The achievement of the large scale Arabidopsis genome sequencing, combined with the DNA complementary to RNA (cDNA) expressed sequence tags projects, provided the opportunity to draw up the first comprehensive list of peroxidases in a plant. By screening the available databases, we have identified 73 peroxidase genes throughout the Arabidopsis genome. The evolution of the peroxidase multigene family has been investigated by analyzing the gene structure (intron/exon) in correlation with the phylogenetic relationships between the isoperoxidases. An evolutionary pattern of extensive gene duplications can be inferred and is discussed. Using a cDNA array procedure, the expression pattern of 23 peroxidases was established in the different organs of the plant. All the tested peroxidases were expressed at various levels in roots, while several were also detected in stems, leaves and flowers. The specific functions of these genes remain to be determined.


Asunto(s)
Arabidopsis/genética , Peroxidasas/genética , Arabidopsis/enzimología , Mapeo Cromosómico , Regulación Enzimológica de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Variación Genética , Genoma de Planta , Familia de Multigenes/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Filogenia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...