RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification.
Genome Biol
; 19(1): 165, 2018 10 30.
Article
em En
| MEDLINE
| ID: mdl-30373669
ABSTRACT
In order to determine the role of the database in taxonomic sequence classification, we examine the influence of the database over time on k-mer-based lowest common ancestor taxonomic classification. We present three major findings:
the number of new species added to the NCBI RefSeq database greatly outpaces the number of new genera; as a result, more reads are classified with newer database versions, but fewer are classified at the species level; and Bayesian-based re-estimation mitigates this effect but struggles with novel genomes. These results suggest a need for new classification approaches specially adapted for large databases.Palavras-chave
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Algoritmos
/
Análise de Sequência de DNA
/
Bases de Dados Genéticas
Tipo de estudo:
Diagnostic_studies
/
Prognostic_studies
Idioma:
En
Ano de publicação:
2018
Tipo de documento:
Article