Your browser doesn't support javascript.
loading
Improving the consistency of domain annotation within the Conserved Domain Database.
Derbyshire, Myra K; Gonzales, Noreen R; Lu, Shennan; He, Jane; Marchler, Gabriele H; Wang, Zhouxi; Marchler-Bauer, Aron.
Afiliação
  • Derbyshire MK; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38 A, Room 5S508, 8600 Rockville Pike, Bethesda, MD 20894, USA.
  • Gonzales NR; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38 A, Room 5S508, 8600 Rockville Pike, Bethesda, MD 20894, USA.
  • Lu S; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38 A, Room 5S508, 8600 Rockville Pike, Bethesda, MD 20894, USA.
  • He J; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38 A, Room 5S508, 8600 Rockville Pike, Bethesda, MD 20894, USA.
  • Marchler GH; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38 A, Room 5S508, 8600 Rockville Pike, Bethesda, MD 20894, USA.
  • Wang Z; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38 A, Room 5S508, 8600 Rockville Pike, Bethesda, MD 20894, USA.
  • Marchler-Bauer A; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38 A, Room 5S508, 8600 Rockville Pike, Bethesda, MD 20894, USA bauer@ncbi.nlm.nih.gov.
Article em En | MEDLINE | ID: mdl-25767294
ABSTRACT
When annotating protein sequences with the footprints of evolutionarily conserved domains, conservative score or E-value thresholds need to be applied for RPS-BLAST hits, to avoid many false positives. We notice that manual inspection and classification of hits gathered at a higher threshold can add a significant amount of valuable domain annotation. We report an automated algorithm that 'rescues' valuable borderline-scoring domain hits that are well-supported by domain architecture (DA, the sequential order of conserved domains in a protein query), including tandem repeats of domain hits reported at a more conservative threshold. This algorithm is now available as a selectable option on the public conserved domain search (CD-Search) pages. We also report on the possibility to 'suppress' domain hits close to the threshold based on a lack of well-supported DA and to implement this conservatively as an option in live conserved domain searches and for pre-computed results. Improving domain annotation consistency will in turn reduce the fraction of NR sequences with incomplete DAs.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Análise de Sequência de Proteína / Bases de Dados de Proteínas / Anotação de Sequência Molecular Idioma: En Revista: Database (Oxford) Ano de publicação: 2015 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Análise de Sequência de Proteína / Bases de Dados de Proteínas / Anotação de Sequência Molecular Idioma: En Revista: Database (Oxford) Ano de publicação: 2015 Tipo de documento: Article País de afiliação: Estados Unidos