Your browser doesn't support javascript.
loading
RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification.
Nasko, Daniel J; Koren, Sergey; Phillippy, Adam M; Treangen, Todd J.
Afiliação
  • Nasko DJ; Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA.
  • Koren S; Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA.
  • Phillippy AM; Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA.
  • Treangen TJ; Department of Computer Science, Rice University, Houston, TX, USA. treangen@rice.edu.
Genome Biol ; 19(1): 165, 2018 10 30.
Article em En | MEDLINE | ID: mdl-30373669
ABSTRACT
In order to determine the role of the database in taxonomic sequence classification, we examine the influence of the database over time on k-mer-based lowest common ancestor taxonomic classification. We present three major

findings:

the number of new species added to the NCBI RefSeq database greatly outpaces the number of new genera; as a result, more reads are classified with newer database versions, but fewer are classified at the species level; and Bayesian-based re-estimation mitigates this effect but struggles with novel genomes. These results suggest a need for new classification approaches specially adapted for large databases.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Análise de Sequência de DNA / Bases de Dados Genéticas Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Ano de publicação: 2018 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Análise de Sequência de DNA / Bases de Dados Genéticas Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Ano de publicação: 2018 Tipo de documento: Article