UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.

Suzek, Baris E; Wang, Yuqi; Huang, Hongzhan; McGarvey, Peter B; Wu, Cathy H

Suzek, Baris E; Wang, Yuqi; Huang, Hongzhan; McGarvey, Peter B; Wu, Cathy H.

Afiliação

Suzek BE; Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA, Department of Computer Engineering, Mugla Sitki Koçman University, Mugla 48000, Turkey, Center for Bioinformatics and Computational Biology and Protein Information Resource, University of Delaware, Newark,
Wang Y; Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA, Department of Computer Engineering, Mugla Sitki Koçman University, Mugla 48000, Turkey, Center for Bioinformatics and Computational Biology and Protein Information Resource, University of Delaware, Newark,
Huang H; Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA, Department of Computer Engineering, Mugla Sitki Koçman University, Mugla 48000, Turkey, Center for Bioinformatics and Computational Biology and Protein Information Resource, University of Delaware, Newark,
McGarvey PB; Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA, Department of Computer Engineering, Mugla Sitki Koçman University, Mugla 48000, Turkey, Center for Bioinformatics and Computational Biology and Protein Information Resource, University of Delaware, Newark,
Wu CH; Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA, Department of Computer Engineering, Mugla Sitki Koçman University, Mugla 48000, Turkey, Center for Bioinformatics and Computational Biology and Protein Information Resource, University of Delaware, Newark,

Bioinformatics ; 31(6): 926-32, 2015 Mar 15.

Article em En | MEDLINE | ID: mdl-25398609

RESUMO

MOTIVATION: UniRef databases provide full-scale clustering of UniProtKB sequences and are utilized for a broad range of applications, particularly similarity-based functional annotation. Non-redundancy and intra-cluster homogeneity in UniRef were recently improved by adding a sequence length overlap threshold. Our hypothesis is that these improvements would enhance the speed and sensitivity of similarity searches and improve the consistency of annotation within clusters. RESULTS: Intra-cluster molecular function consistency was examined by analysis of Gene Ontology terms. Results show that UniRef clusters bring together proteins of identical molecular function in more than 97% of the clusters, implying that clusters are useful for annotation and can also be used to detect annotation inconsistencies. To examine coverage in similarity results, BLASTP searches against UniRef50 followed by expansion of the hit lists with cluster members demonstrated advantages compared with searches against UniProtKB sequences; the searches are concise (â¼7 times shorter hit list before expansion), faster (â¼6 times) and more sensitive in detection of remote similarities (>96% recall at e-value <0.0001). Our results support the use of UniRef clusters as a comprehensive and scalable alternative to native sequence databases for similarity searches and reinforces its reliability for use in functional annotation.

Assuntos

Biologia Computacional; Bases de Dados de Proteínas; Dioxigenases/metabolismo; Proteínas de Membrana/metabolismo; Proteínas/metabolismo; Análise de Sequência de Proteína; Software; Homólogo AlkB 5 da RNA Desmetilase; Análise por Conglomerados; Dioxigenases/química; Dioxigenases/genética; Ontologia Genética; Humanos; Armazenamento e Recuperação da Informação; Proteínas de Membrana/química; Proteínas de Membrana/genética; Anotação de Sequência Molecular; Proteínas/química; Proteínas/genética

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Limite: Humans Idioma: En Ano de publicação: 2015 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Limite: Humans Idioma: En Ano de publicação: 2015 Tipo de documento: Article