KCMBT: a k-mer Counter based on Multiple Burst Trees.

Mamun, Abdullah-Al; Pal, Soumitra; Rajasekaran, Sanguthevar

Mamun, Abdullah-Al; Pal, Soumitra; Rajasekaran, Sanguthevar.

Afiliação

Mamun AA; Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.
Pal S; Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.
Rajasekaran S; Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.

Bioinformatics ; 32(18): 2783-90, 2016 09 15.

Article em En | MEDLINE | ID: mdl-27283950

ABSTRACT

ABSTRACT

MOTIVATION A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. A k-mer counter generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection and many other related applications use a k-mer counter as a building block. Very fast and efficient algorithms are necessary to count k-mers in large data sets to be useful in such applications.

RESULTS:

We propose a novel trie-based algorithm for this k-mer counting problem. We compare our devised algorithm k-mer Counter based on Multiple Burst Trees (KCMBT) with available all well-known algorithms. Our experimental results show that KCMBT is around 30% faster than the previous best-performing algorithm KMC2 for human genome dataset. As another example, our algorithm is around six times faster than Jellyfish2. Overall, KCMBT is 20-30% faster than KMC2 on five benchmark data sets when both the algorithms were run using multiple threads. AVAILABILITY AND IMPLEMENTATION KCMBT is freely available on GitHub (https//github.com/abdullah009/kcmbt_mt). CONTACT rajasek@engr.uconn.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos; Alinhamento de Sequência; Análise de Sequência de DNA; Sequência de Bases; Biologia Computacional/métodos; Genoma; Humanos; Software

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Alinhamento de Sequência / Análise de Sequência de DNA Limite: Humans Idioma: En Ano de publicação: 2016 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google