Pesquisa | Portal Regional da BVS

UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase.

MacDougall, Alistair; Volynkin, Vladimir; Saidi, Rabie; Poggioli, Diego; Zellner, Hermann; Hatton-Ellis, Emma; Joshi, Vishal; O'Donovan, Claire; Orchard, Sandra; Auchincloss, Andrea H; Baratin, Delphine; Bolleman, Jerven; Coudert, Elisabeth; de Castro, Edouard; Hulo, Chantal; Masson, Patrick; Pedruzzi, Ivo; Rivoire, Catherine; Arighi, Cecilia; Wang, Qinghua; Chen, Chuming; Huang, Hongzhan; Garavelli, John; Vinayaka, C R; Yeh, Lai-Su; Natale, Darren A; Laiho, Kati; Martin, Maria-Jesus; Renaux, Alexandre; Pichler, Klemens.

Bioinformatics ; 36(22-23): 5562, 2021 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-33821964

UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase.

Bioinformatics ; 36(17): 4643-4648, 2020 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-32399560

RESUMO

MOTIVATION: The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge. RESULTS: In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB. AVAILABILITY AND IMPLEMENTATION: UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.

Assuntos

Bases de Conhecimento , Proteínas , Mapeamento Cromossômico , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Proteínas/genética

UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB.

Dogan, Tunca; MacDougall, Alistair; Saidi, Rabie; Poggioli, Diego; Bateman, Alex; O'Donovan, Claire; Martin, Maria J.

Bioinformatics ; 32(15): 2264-71, 2016 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-27153729

RESUMO

MOTIVATION: Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention. One of these novel approaches is to use the organization of the structural domains in proteins. RESULTS: We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a sub-set of UniProtKB/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB/TrEMBL resulted in 44 818 178 GO term predictions for 12 172 114 proteins. 22% of these predictions were for 2 812 016 previously non-annotated protein entries indicating the significance of the value added by this approach. AVAILABILITY AND IMPLEMENTATION: The results of the method are available at: ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/ CONTACT: tdogan@ebi.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Bases de Dados de Proteínas , Bases de Conhecimento , Anotação de Sequência Molecular , Sequência de Aminoácidos , Proteínas

The UniProt-GO Annotation database in 2011.

Dimmer, Emily C; Huntley, Rachael P; Alam-Faruque, Yasmin; Sawford, Tony; O'Donovan, Claire; Martin, Maria J; Bely, Benoit; Browne, Paul; Mun Chan, Wei; Eberhardt, Ruth; Gardner, Michael; Laiho, Kati; Legge, Duncan; Magrane, Michele; Pichler, Klemens; Poggioli, Diego; Sehra, Harminder; Auchincloss, Andrea; Axelsen, Kristian; Blatter, Marie-Claude; Boutet, Emmanuel; Braconi-Quintaje, Silvia; Breuza, Lionel; Bridge, Alan; Coudert, Elizabeth; Estreicher, Anne; Famiglietti, Livia; Ferro-Rojas, Serenella; Feuermann, Marc; Gos, Arnaud; Gruaz-Gumowski, Nadine; Hinz, Ursula; Hulo, Chantal; James, Janet; Jimenez, Silvia; Jungo, Florence; Keller, Guillaume; Lemercier, Phillippe; Lieberherr, Damien; Masson, Patrick; Moinat, Madelaine; Pedruzzi, Ivo; Poux, Sylvain; Rivoire, Catherine; Roechert, Bernd; Schneider, Michael; Stutz, Andre; Sundaram, Shyamala; Tognolli, Michael; Bougueleret, Lydie.

Nucleic Acids Res ; 40(Database issue): D565-70, 2012 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-22123736

RESUMO

The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins. Currently supplying over 100 million annotations to 11 million proteins in more than 360,000 taxa, this resource has increased 2-fold over the last 2 years and has benefited from a wealth of checks to improve annotation correctness and consistency as well as now supplying a greater information content enabled by GO Consortium annotation format developments. Detailed, manual GO annotations obtained from the curation of peer-reviewed papers are directly contributed by all UniProt curators and supplemented with manual and electronic annotations from 36 model organism and domain-focused scientific resources. The inclusion of high-quality, automatic annotation predictions ensures the UniProt GO annotation dataset supplies functional information to a wide range of proteins, including those from poorly characterized, non-model organism species. UniProt GO annotations are freely available in a range of formats accessible by both file downloads and web-based views. In addition, the introduction of a new, normalized file format in 2010 has made for easier handling of the complete UniProt-GOA data set.

Assuntos

Bases de Dados de Proteínas , Anotação de Sequência Molecular , Vocabulário Controlado , Anotação de Sequência Molecular/normas

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA