Pesquisa | BVS Integralidade em Saúde

Scrooge: a fast and memory-frugal genomic sequence aligner for CPUs, GPUs, and ASICs.

Lindegger, Joël; Senol Cali, Damla; Alser, Mohammed; Gómez-Luna, Juan; Ghiasi, Nika Mansouri; Mutlu, Onur.

Bioinformatics ; 39(5)2023 05 04.

Artigo em Inglês | MEDLINE | ID: mdl-36961334

RESUMO

MOTIVATION: Pairwise sequence alignment is a very time-consuming step in common bioinformatics pipelines. Speeding up this step requires heuristics, efficient implementations, and/or hardware acceleration. A promising candidate for all of the above is the recently proposed GenASM algorithm. We identify and address three inefficiencies in the GenASM algorithm: it has a high amount of data movement, a large memory footprint, and does some unnecessary work. RESULTS: We propose Scrooge, a fast and memory-frugal genomic sequence aligner. Scrooge includes three novel algorithmic improvements which reduce the data movement, memory footprint, and the number of operations in the GenASM algorithm. We provide efficient open-source implementations of the Scrooge algorithm for CPUs and GPUs, which demonstrate the significant benefits of our algorithmic improvements. For long reads, the CPU version of Scrooge achieves a 20.1×, 1.7×, and 2.1× speedup over KSW2, Edlib, and a CPU implementation of GenASM, respectively. The GPU version of Scrooge achieves a 4.0×, 80.4×, 6.8×, 12.6×, and 5.9× speedup over the CPU version of Scrooge, KSW2, Edlib, Darwin-GPU, and a GPU implementation of GenASM, respectively. We estimate an ASIC implementation of Scrooge to use 3.6× less chip area and 2.1× less power than a GenASM ASIC while maintaining the same throughput. Further, we systematically analyze the throughput and accuracy behavior of GenASM and Scrooge under various configurations. As the best configuration of Scrooge depends on the computing platform, we make several observations that can help guide future implementations of Scrooge. AVAILABILITY AND IMPLEMENTATION: https://github.com/CMU-SAFARI/Scrooge.

Assuntos

Algoritmos , Computadores , Genoma , Genômica , Biologia Computacional

BLEND: a fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis.

Firtina, Can; Park, Jisung; Alser, Mohammed; Kim, Jeremie S; Cali, Damla Senol; Shahroodi, Taha; Ghiasi, Nika Mansouri; Singh, Gagandeep; Kanellopoulos, Konstantinos; Alkan, Can; Mutlu, Onur.

NAR Genom Bioinform ; 5(1): lqad004, 2023 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-36685727

RESUMO

Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup of their hash values. However, these hash values can be used only for finding exact-matching seeds as the conventional hashing methods assign distinct hash values for different seeds, including highly similar seeds. Finding only exact-matching seeds causes either (i) increasing the use of the costly sequence alignment or (ii) limited sensitivity. We introduce BLEND, the first efficient and accurate mechanism that can identify both exact-matching and highly similar seeds with a single lookup of their hash values, called fuzzy seed matches. BLEND (i) utilizes a technique called SimHash, that can generate the same hash value for similar sets, and (ii) provides the proper mechanisms for using seeds as sets with the SimHash technique to find fuzzy seed matches efficiently. We show the benefits of BLEND when used in read overlapping and read mapping. For read overlapping, BLEND is faster by 2.4×-83.9× (on average 19.3×), has a lower memory footprint by 0.9×-14.1× (on average 3.8×), and finds higher quality overlaps leading to accurate de novo assemblies than the state-of-the-art tool, minimap2. For read mapping, BLEND is faster by 0.8×-4.1× (on average 1.7×) than minimap2. Source code is available at https://github.com/CMU-SAFARI/BLEND.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa