Pesquisa | BVS Integralidade em Saúde

Optimal seed solver: optimizing seed selection in read mapping.

Xin, Hongyi; Nahar, Sunny; Zhu, Richard; Emmons, John; Pekhimenko, Gennady; Kingsford, Carl; Alkan, Can; Mutlu, Onur.

Bioinformatics ; 32(11): 1632-42, 2016 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-26568624

RESUMO

MOTIVATION: Optimizing seed selection is an important problem in read mapping. The number of non-overlapping seeds a mapper selects determines the sensitivity of the mapper while the total frequency of all selected seeds determines the speed of the mapper. Modern seed-and-extend mappers usually select seeds with either an equal and fixed-length scheme or with an inflexible placement scheme, both of which limit the ability of the mapper in selecting less frequent seeds to speed up the mapping process. Therefore, it is crucial to develop a new algorithm that can adjust both the individual seed length and the seed placement, as well as derive less frequent seeds. RESULTS: We present the Optimal Seed Solver (OSS), a dynamic programming algorithm that discovers the least frequently-occurring set of x seeds in an L-base-pair read in [Formula: see text] operations on average and in [Formula: see text] operations in the worst case, while generating a maximum of [Formula: see text] seed frequency database lookups. We compare OSS against four state-of-the-art seed selection schemes and observe that OSS provides a 3-fold reduction in average seed frequency over the best previous seed selection optimizations. AVAILABILITY AND IMPLEMENTATION: We provide an implementation of the Optimal Seed Solver in C++ at: https://github.com/CMU-SAFARI/Optimal-Seed-Solver CONTACT: hxin@cmu.edu, calkan@cs.bilkent.edu.tr or onur@cmu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos

Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping.

Xin, Hongyi; Greth, John; Emmons, John; Pekhimenko, Gennady; Kingsford, Carl; Alkan, Can; Mutlu, Onur.

Bioinformatics ; 31(10): 1553-60, 2015 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-25577434

RESUMO

MOTIVATION: Calculating the edit-distance (i.e. minimum number of insertions, deletions and substitutions) between short DNA sequences is the primary task performed by seed-and-extend based mappers, which compare billions of sequences. In practice, only sequence pairs with a small edit-distance provide useful scientific data. However, the majority of sequence pairs analyzed by seed-and-extend based mappers differ by significantly more errors than what is typically allowed. Such error-abundant sequence pairs needlessly waste resources and severely hinder the performance of read mappers. Therefore, it is crucial to develop a fast and accurate filter that can rapidly and efficiently detect error-abundant string pairs and remove them from consideration before more computationally expensive methods are used. RESULTS: We present a simple and efficient algorithm, Shifted Hamming Distance (SHD), which accelerates the alignment verification procedure in read mapping, by quickly filtering out error-abundant sequence pairs using bit-parallel and SIMD-parallel operations. SHD only filters string pairs that contain more errors than a user-defined threshold, making it fully comprehensive. It also maintains high accuracy with moderate error threshold (up to 5% of the string length) while achieving a 3-fold speedup over the best previous algorithm (Gene Myers's bit-vector algorithm). SHD is compatible with all mappers that perform sequence alignment for verification.

Assuntos

Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Sequência de Bases , Humanos , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa