Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Bioinformatics ; 36(17): 4643-4648, 2020 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-32399560

RESUMEN

MOTIVATION: The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge. RESULTS: In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB. AVAILABILITY AND IMPLEMENTATION: UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.


Asunto(s)
Bases del Conocimiento , Proteínas , Mapeo Cromosómico , Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Proteínas/genética
3.
Nucleic Acids Res ; 43(Database issue): D1064-70, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25348399

RESUMEN

HAMAP (High-quality Automated and Manual Annotation of Proteins--available at http://hamap.expasy.org/) is a system for the automatic classification and annotation of protein sequences. HAMAP provides annotation of the same quality and detail as UniProtKB/Swiss-Prot, using manually curated profiles for protein sequence family classification and expert curated rules for functional annotation of family members. HAMAP data and tools are made available through our website and as part of the UniRule pipeline of UniProt, providing annotation for millions of unreviewed sequences of UniProtKB/TrEMBL. Here we report on the growth of HAMAP and updates to the HAMAP system since our last report in the NAR Database Issue of 2013. We continue to augment HAMAP with new family profiles and annotation rules as new protein families are characterized and annotated in UniProtKB/Swiss-Prot; the latest version of HAMAP (as of 3 September 2014) contains 1983 family classification profiles and 1998 annotation rules (up from 1780 and 1720). We demonstrate how the complex logic of HAMAP rules allows for precise annotation of individual functional variants within large homologous protein families. We also describe improvements to our web-based tool HAMAP-Scan which simplify the classification and annotation of sequences, and the incorporation of an improved sequence-profile search algorithm.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Homología de Secuencia de Aminoácido , Humanos , Internet , Proteínas/clasificación
4.
Nucleic Acids Res ; 41(Database issue): D584-9, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23193261

RESUMEN

HAMAP (High-quality Automated and Manual Annotation of Proteins-available at http://hamap.expasy.org/) is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated annotation rules that specify annotations that apply to family members. HAMAP was originally developed to support the manual curation of UniProtKB/Swiss-Prot records describing microbial proteins. Here we describe new developments in HAMAP, including the extension of HAMAP to eukaryotic proteins, the use of HAMAP in the automated annotation of UniProtKB/TrEMBL, providing high-quality annotation for millions of protein sequences, and the future integration of HAMAP into a unified system for UniProtKB annotation, UniRule. HAMAP is continuously updated by expert curators with new family profiles and annotation rules as new protein families are characterized. The collection of HAMAP family classification profiles and annotation rules can be browsed and viewed on the HAMAP website, which also provides an interface to scan user sequences against HAMAP profiles.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Proteínas/clasificación , Eucariontes/genética , Internet
5.
Database (Oxford) ; 20222022 04 12.
Artículo en Inglés | MEDLINE | ID: mdl-35411389

RESUMEN

SwissBioPics (www.swissbiopics.org) is a freely available resource of interactive, high-resolution cell images designed for the visualization of subcellular location data. SwissBioPics provides images describing cell types from all kingdoms of life-from the specialized muscle, neuronal and epithelial cells of animals, to the rods, cocci, clubs and spirals of prokaryotes. All cell images in SwissBioPics are drawn in Scalable Vector Graphics (SVG), with each subcellular location tagged with a unique identifier from the controlled vocabulary of subcellular locations and organelles of UniProt (https://www.uniprot.org/locations/). Users can search and explore SwissBioPics cell images through our website, which provides a platform for users to learn more about how cells are organized. A web component allows developers to embed SwissBioPics images in their own websites, using the associated JavaScript and a styling template, and to highlight subcellular locations and organelles by simply providing the web component with the appropriate identifier(s) from the UniProt-controlled vocabulary or the 'Cellular Component' branch of the Gene Ontology (www.geneontology.org), as well as an organism identifier from the National Center for Biotechnology Information taxonomy (https://www.ncbi.nlm.nih.gov/taxonomy). The UniProt website now uses SwissBioPics to visualize the subcellular locations and organelles where proteins function. SwissBioPics is freely available for anyone to use under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. DATABASE URL: www.swissbiopics.org.


Asunto(s)
Proteínas , Vocabulario Controlado , Animales
6.
Nucleic Acids Res ; 37(Database issue): D471-8, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18849571

RESUMEN

The growth in the number of completely sequenced microbial genomes (bacterial and archaeal) has generated a need for a procedure that provides UniProtKB/Swiss-Prot-quality annotation to as many protein sequences as possible. We have devised a semi-automated system, HAMAP (High-quality Automated and Manual Annotation of microbial Proteomes), that uses manually built annotation templates for protein families to propagate annotation to all members of manually defined protein families, using very strict criteria. The HAMAP system is composed of two databases, the proteome database and the family database, and of an automatic annotation pipeline. The proteome database comprises biological and sequence information for each completely sequenced microbial proteome, and it offers several tools for CDS searches, BLAST options and retrieval of specific sets of proteins. The family database currently comprises more than 1500 manually curated protein families and their annotation templates that are used to annotate proteins that belong to one of the HAMAP families. On the HAMAP website, individual sequences as well as whole genomes can be scanned against all HAMAP families. The system provides warnings for the absence of conserved amino acid residues, unusual sequence length, etc. Thanks to the implementation of HAMAP, more than 200,000 microbial proteins have been fully annotated in UniProtKB/Swiss-Prot (HAMAP website: http://www.expasy.org/sprot/hamap).


Asunto(s)
Proteínas Arqueales/química , Proteínas Bacterianas/química , Bases de Datos de Proteínas , Proteómica , Proteínas Arqueales/clasificación , Proteínas Arqueales/genética , Proteínas Bacterianas/clasificación , Proteínas Bacterianas/genética , Genómica , Proteoma/química , Alineación de Secuencia , Análisis de Secuencia de Proteína , Programas Informáticos
7.
Gigascience ; 9(2)2020 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-32034905

RESUMEN

BACKGROUND: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. RESULTS: Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline. CONCLUSIONS: HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.


Asunto(s)
Genómica/métodos , Anotación de Secuencia Molecular/métodos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos/normas , Animales , Genómica/normas , Humanos , Anotación de Secuencia Molecular/normas , Análisis de Secuencia de ADN/normas , Análisis de Secuencia de Proteína/normas
8.
J Cell Biol ; 157(6): 953-62, 2002 Jun 10.
Artículo en Inglés | MEDLINE | ID: mdl-12045185

RESUMEN

Genetic analysis has revealed that the three nucleus-encoded factors Tbc1, Tbc2, and Tbc3 are involved in the translation of the chloroplast psbC mRNA of the eukaryotic green alga Chlamydomonas reinhardtii. In this study we report the isolation and phenotypic characterization of two new tbc2 mutant alleles and their use for cloning and characterizing the Tbc2 gene by genomic complementation. TBC2 encodes a protein of 1,115 residues containing nine copies of a novel degenerate 38-40 amino acid repeat with a quasiconserved PPPEW motif near its COOH-terminal end. The middle part of the Tbc2 protein displays partial amino acid sequence identity with Crp1, a protein from Zea mays that is implicated in the processing and translation of the chloroplast petA and petD RNAs. The Tbc2 protein is enriched in chloroplast stromal subfractions and is associated with a 400-kD protein complex that appears to play a role in the translation of specifically the psbC mRNA.


Asunto(s)
Chlamydomonas reinhardtii/genética , Proteínas Nucleares/química , Proteínas Nucleares/metabolismo , Proteínas del Complejo del Centro de Reacción Fotosintética/genética , Proteínas de Plantas/metabolismo , Biosíntesis de Proteínas , Proteínas Protozoarias/química , ARN Mensajero/metabolismo , Alelos , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Animales , Proteínas Potenciadoras de Unión a CCAAT/química , Núcleo Celular/metabolismo , Chlamydomonas reinhardtii/metabolismo , Cloroplastos/metabolismo , Clonación Molecular , Secuencia Conservada , Datos de Secuencia Molecular , Mutación , Proteínas Nucleares/genética , Proteínas Nucleares/fisiología , Proteínas de Plantas/genética , Proteínas Protozoarias/fisiología , Homología de Secuencia de Aminoácido
9.
PLoS One ; 12(2): e0171746, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28207819

RESUMEN

Viruses are genetically diverse, infect a wide range of tissues and host cells and follow unique processes for replicating themselves. All these processes were investigated and indexed in ViralZone knowledge base. To facilitate standardizing data, a simple ontology of viral life-cycle terms was developed to provide a common vocabulary for annotating data sets. New terminology was developed to address unique viral replication cycle processes, and existing terminology was modified and adapted. The virus life-cycle is classically described by schematic pictures. Using this ontology, it can be represented by a combination of successive terms: "entry", "latency", "transcription", "replication" and "exit". Each of these parts is broken down into discrete steps. For example Zika virus "entry" is broken down in successive steps: "Attachment", "Apoptotic mimicry", "Viral endocytosis/ macropinocytosis", "Fusion with host endosomal membrane", "Viral factory". To demonstrate the utility of a standard ontology for virus biology, this work was completed by annotating virus data in the ViralZone, UniProtKB and Gene Ontology databases.


Asunto(s)
Células Eucariotas/virología , Terminología como Asunto , Virosis/virología , Fenómenos Fisiológicos de los Virus , Bases de Datos Genéticas , Replicación Viral , Virus/genética , Virus/patogenicidad
10.
Viruses ; 9(6)2017 05 23.
Artículo en Inglés | MEDLINE | ID: mdl-28545254

RESUMEN

Bacterial viruses, also called bacteriophages, display a great genetic diversity and utilize unique processes for infecting and reproducing within a host cell. All these processes were investigated and indexed in the ViralZone knowledge base. To facilitate standardizing data, a simple ontology of viral life-cycle terms was developed to provide a common vocabulary for annotating data sets. New terminology was developed to address unique viral replication cycle processes, and existing terminology was modified and adapted. Classically, the viral life-cycle is described by schematic pictures. Using this ontology, it can be represented by a combination of successive events: entry, latency, transcription/replication, host-virus interactions and virus release. Each of these parts is broken down into discrete steps. For example enterobacteria phage lambda entry is broken down in: viral attachment to host adhesion receptor, viral attachment to host entry receptor, viral genome ejection and viral genome circularization. To demonstrate the utility of a standard ontology for virus biology, this work was completed by annotating virus data in the ViralZone, UniProtKB and Gene Ontology databases.


Asunto(s)
Bacteriófagos/genética , Bacteriófagos/fisiología , Ontologías Biológicas , Bacteriófagos/clasificación , Bacteriófagos/crecimiento & desarrollo , Bases de Datos Factuales , Interacciones Huésped-Patógeno , Terminología como Asunto
11.
Genetics ; 163(3): 895-904, 2003 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-12663530

RESUMEN

Translation of the chloroplast psbC mRNA in the unicellular eukaryotic alga Chlamydomonas reinhardtii is controlled by interactions between its 547-base 5' untranslated region and the products of the nuclear loci TBC1, TBC2, and possibly TBC3. In this study, a series of site-directed mutations in this region was generated and the ability of these constructs to drive expression of a reporter gene was assayed in chloroplast transformants that are wild type or mutant at these nuclear loci. Two regions located in the middle of the 5' leader and near the initiation codon are important for translation. Other deletions still allow for partial expression of the reporter gene in the wild-type background. Regions with target sites for TBC1 and TBC2 were identified by estimating the residual translation activity in the respective mutant backgrounds. TBC1 targets include mostly the central part of the leader and the translation initiation region whereas the only detected TBC2 targets are in the 3' part. The 5'-most 93 nt of the leader are required for wild-type levels of transcription and/or mRNA stabilization. The results indicate that TBC1 and TBC2 function independently and further support the possibility that TBC1 acts together with TBC3.


Asunto(s)
Regiones no Traducidas 5'/genética , Núcleo Celular/genética , Chlamydomonas reinhardtii/genética , Cloroplastos/genética , Biosíntesis de Proteínas/genética , ARN Mensajero/genética , Animales , Secuencia de Bases , Datos de Secuencia Molecular , Alineación de Secuencia , Homología de Secuencia de Ácido Nucleico
12.
Comput Biol Chem ; 27(1): 49-58, 2003 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-12798039

RESUMEN

Large-scale sequencing of prokaryotic genomes demands the automation of certain annotation tasks currently manually performed in the production of the SWISS-PROT protein knowledgebase. The HAMAP project, or 'High-quality Automated and Manual Annotation of microbial Proteomes', aims to integrate manual and automatic annotation methods in order to enhance the speed of the curation process while preserving the quality of the database annotation. Automatic annotation is only applied to entries that belong to manually defined orthologous families and to entries with no identifiable similarities (ORFans). Many checks are enforced in order to prevent the propagation of wrong annotation and to spot problematic cases, which are channelled to manual curation. The results of this annotation are integrated in SWISS-PROT, and a website is provided at http://www.expasy.org/sprot/hamap/.


Asunto(s)
Proteínas Bacterianas/clasificación , Proteínas Bacterianas/fisiología , Sistemas de Administración de Bases de Datos/tendencias , Bases de Datos de Proteínas/clasificación , Bases de Datos de Proteínas/normas , Proteoma/clasificación , Proteoma/fisiología , Secuencia de Aminoácidos , Sistemas de Administración de Bases de Datos/normas , Genoma Bacteriano , Datos de Secuencia Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA