Your browser doesn't support javascript.
loading
A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures.
Liu, Xinan; Yu, Ye; Liu, Jinpeng; Elliott, Corrine F; Qian, Chen; Liu, Jinze.
Afiliación
  • Liu X; Department of Computer Science, University of Kentucky, Lexington, KY, USA.
  • Yu Y; Department of Computer Science,University of Kentucky, Lexington, KY, USA.
  • Liu J; Biostatistics and Bioinformatics Shared Resource Facility, Markey Cancer Center, University of Kentucky, Lexington, KY, USA.
  • Elliott CF; Department of Computer Science, University of Kentucky, Lexington, KY, USA.
  • Qian C; Department of Computer Engineering, UC Santa Cruz, Santa Cruz, CA, USA.
  • Liu J; Department of Computer Science, University of Kentucky, Lexington, KY, USA.
Bioinformatics ; 34(1): 171-178, 2018 01 01.
Article en En | MEDLINE | ID: mdl-29036588
ABSTRACT
Motivation Metagenomic read classification is a critical step in the identification and quantification of microbial species sampled by high-throughput sequencing. Although many algorithms have been developed to date, they suffer significant memory and/or computational costs. Due to the growing popularity of metagenomic data in both basic science and clinical applications, as well as the increasing volume of data being generated, efficient and accurate algorithms are in high demand.

Results:

We introduce MetaOthello, a probabilistic hashing classifier for metagenomic sequencing reads. The algorithm employs a novel data structure, called l-Othello, to support efficient querying of a taxon using its k-mer signatures. MetaOthello is an order-of-magnitude faster than the current state-of-the-art algorithms Kraken and Clark, and requires only one-third of the RAM. In comparison to Kaiju, a metagenomic classification tool using protein sequences instead of genomic sequences, MetaOthello is three times faster and exhibits 20-30% higher classification sensitivity. We report comparative analyses of both scalability and accuracy using a number of simulated and empirical datasets. Availability and implementation MetaOthello is a stand-alone program implemented in C ++. The current version (1.0) is accessible via https//doi.org/10.5281/zenodo.808941. Contact liuj@cs.uky.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Asunto(s)

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Análisis de Secuencia de ADN / Metagenómica / Secuenciación de Nucleótidos de Alto Rendimiento / Genoma Microbiano / Microbiota Límite: Humans Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2018 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Análisis de Secuencia de ADN / Metagenómica / Secuenciación de Nucleótidos de Alto Rendimiento / Genoma Microbiano / Microbiota Límite: Humans Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2018 Tipo del documento: Article País de afiliación: Estados Unidos