Your browser doesn't support javascript.
loading
MSC: a metagenomic sequence classification algorithm.
Saha, Subrata; Johnson, Jethro; Pal, Soumitra; Weinstock, George M; Rajasekaran, Sanguthevar.
Afiliação
  • Saha S; Healthcare and Life Sciences Division, IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA.
  • Johnson J; The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
  • Pal S; National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, USA.
  • Weinstock GM; The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
  • Rajasekaran S; Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA.
Bioinformatics ; 35(17): 2932-2940, 2019 09 01.
Article em En | MEDLINE | ID: mdl-30649204
ABSTRACT
MOTIVATION Metagenomics is the study of genetic materials directly sampled from natural habitats. It has the potential to reveal previously hidden diversity of microscopic life largely due to the existence of highly parallel and low-cost next-generation sequencing technology. Conventional approaches align metagenomic reads onto known reference genomes to identify microbes in the sample. Since such a collection of reference genomes is very large, the approach often needs high-end computing machines with large memory which is not often available to researchers. Alternative approaches follow an alignment-free methodology where the presence of a microbe is predicted using the information about the unique k-mers present in the microbial genomes. However, such approaches suffer from high false positives due to trading off the value of k with the computational resources. In this article, we propose a highly efficient metagenomic sequence classification (MSC) algorithm that is a hybrid of both approaches. Instead of aligning reads to the full genomes, MSC aligns reads onto a set of carefully chosen, shorter and highly discriminating model sequences built from the unique k-mers of each of the reference sequences.

RESULTS:

Microbiome researchers are generally interested in two objectives of a taxonomic classifier (i) to detect prevalence, i.e. the taxa present in a sample, and (ii) to estimate their relative abundances. MSC is primarily designed to detect prevalence and experimental results show that MSC is indeed a more effective and efficient algorithm compared to the other state-of-the-art algorithms in terms of accuracy, memory and runtime. Moreover, MSC outputs an approximate estimate of the abundances. AVAILABILITY AND IMPLEMENTATION The implementations are freely available for non-commercial purposes. They can be downloaded from https//drive.google.com/open?id=1XirkAamkQ3ltWvI1W1igYQFusp9DHtVl.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Análise de Sequência de DNA / Metagenoma / Metagenômica Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2019 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Análise de Sequência de DNA / Metagenoma / Metagenômica Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2019 Tipo de documento: Article