Pesquisa | Portal Regional da BVS

QuASeR: Quantum Accelerated de novo DNA sequence reconstruction.

Sarkar, Aritra; Al-Ars, Zaid; Bertels, Koen.

PLoS One ; 16(4): e0249850, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33844699

RESUMO

In this article, we present QuASeR, a reference-free DNA sequence reconstruction implementation via de novo assembly on both gate-based and quantum annealing platforms. This is the first time this important application in bioinformatics is modeled using quantum computation. Each one of the four steps of the implementation (TSP, QUBO, Hamiltonians and QAOA) is explained with a proof-of-concept example to target both the genomics research community and quantum application developers in a self-contained manner. The implementation and results on executing the algorithm from a set of DNA reads to a reconstructed sequence, on a gate-based quantum simulator, the D-Wave quantum annealing simulator and hardware are detailed. We also highlight the limitations of current classical simulation and available quantum hardware systems. The implementation is open-source and can be found on https://github.com/QE-Lab/QuASeR.

Assuntos

Análise de Sequência de DNA/métodos , Software , Animais , Mapeamento de Sequências Contíguas/métodos , Humanos

GPU acceleration of Darwin read overlapper for de novo assembly of long DNA reads.

Ahmed, Nauman; Qiu, Tong Dong; Bertels, Koen; Al-Ars, Zaid.

BMC Bioinformatics ; 21(Suppl 13): 388, 2020 Sep 17.

Artigo em Inglês | MEDLINE | ID: mdl-32938392

RESUMO

BACKGROUND: In Overlap-Layout-Consensus (OLC) based de novo assembly, all reads must be compared with every other read to find overlaps. This makes the process rather slow and limits the practicality of using de novo assembly methods at a large scale in the field. Darwin is a fast and accurate read overlapper that can be used for de novo assembly of state-of-the-art third generation long DNA reads. Darwin is designed to be hardware-friendly and can be accelerated on specialized computer system hardware to achieve higher performance. RESULTS: This work accelerates Darwin on GPUs. Using real Pacbio data, our GPU implementation on Tesla K40 has shown a speedup of 109x vs 8 CPU threads of an Intel Xeon machine and 24x vs 64 threads of IBM Power8 machine. The GPU implementation supports both linear and affine gap, scoring model. The results show that the GPU implementation can achieve the same high speedup for different scoring schemes. CONCLUSIONS: The GPU implementation proposed in this work shows significant improvement in performance compared to the CPU version, thereby making it accessible for utilization as a practical read overlapper in a DNA assembly pipeline. Furthermore, our GPU acceleration can also be used for performing fast Smith-Waterman alignment between long DNA reads. GPU hardware has become commonly available in the field today, making the proposed acceleration accessible to a larger public. The implementation is available at https://github.com/Tongdongq/darwin-gpu .

Assuntos

Algoritmos , DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Humanos

Correction to: GASAL2: a GPU accelerated sequence alignment library for high-throughput NGS data.

Ahmed, Nauman; Lévy, Jonathan; Ren, Shanshan; Mushtaq, Hamid; Bertels, Koen; Al-Ars, Zaid.

BMC Bioinformatics ; 20(1): 597, 2019 Nov 19.

Artigo em Inglês | MEDLINE | ID: mdl-31744474

RESUMO

Following publication of the original article [1], the author requested changes to the figures 4, 7, 8, 9, 12 and 14 to align these with the text. The corrected figures are supplied below.

GASAL2: a GPU accelerated sequence alignment library for high-throughput NGS data.

Ahmed, Nauman; Lévy, Jonathan; Ren, Shanshan; Mushtaq, Hamid; Bertels, Koen; Al-Ars, Zaid.

BMC Bioinformatics ; 20(1): 520, 2019 Oct 25.

Artigo em Inglês | MEDLINE | ID: mdl-31653208

RESUMO

BACKGROUND: Due the computational complexity of sequence alignment algorithms, various accelerated solutions have been proposed to speedup this analysis. NVBIO is the only available GPU library that accelerates sequence alignment of high-throughput NGS data, but has limited performance. In this article we present GASAL2, a GPU library for aligning DNA and RNA sequences that outperforms existing CPU and GPU libraries. RESULTS: The GASAL2 library provides specialized, accelerated kernels for local, global and all types of semi-global alignment. Pairwise sequence alignment can be performed with and without traceback. GASAL2 outperforms the fastest CPU-optimized SIMD implementations such as SeqAn and Parasail, as well as NVIDIA's own GPU-based library known as NVBIO. GASAL2 is unique in performing sequence packing on GPU, which is up to 750x faster than NVBIO. Overall on Geforce GTX 1080 Ti GPU, GASAL2 is up to 21x faster than Parasail on a dual socket hyper-threaded Intel Xeon system with 28 cores and up to 13x faster than NVBIO with a query length of up to 300 bases and 100 bases, respectively. GASAL2 alignment functions are asynchronous/non-blocking and allow full overlap of CPU and GPU execution. The paper shows how to use GASAL2 to accelerate BWA-MEM, speeding up the local alignment by 20x, which gives an overall application speedup of 1.3x vs. CPU with up to 12 threads. CONCLUSIONS: The library provides high performance APIs for local, global and semi-global alignment that can be easily integrated into various bioinformatics tools.

Assuntos

Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Alinhamento de Sequência , Software , Algoritmos , Biologia Computacional , DNA/genética , RNA/genética , Análise de Sequência de DNA , Análise de Sequência de RNA

GPU accelerated sequence alignment with traceback for GATK HaplotypeCaller.

Ren, Shanshan; Ahmed, Nauman; Bertels, Koen; Al-Ars, Zaid.

BMC Genomics ; 20(Suppl 2): 184, 2019 Apr 04.

Artigo em Inglês | MEDLINE | ID: mdl-30967111

RESUMO

BACKGROUND: Pairwise sequence alignment is widely used in many biological tools and applications. Existing GPU accelerated implementations mainly focus on calculating optimal alignment score and omit identifying the optimal alignment itself. In GATK HaplotypeCaller (HC), the semi-global pairwise sequence alignment with traceback has so far been difficult to accelerate effectively on GPUs. RESULTS: We first analyze the characteristics of the semi-global alignment with traceback in GATK HC and then propose a new algorithm that allows for retrieving the optimal alignment efficiently on GPUs. For the first stage, we choose intra-task parallelization model to calculate the position of the optimal alignment score and the backtracking matrix. Moreover, in the first stage, our GPU implementation also records the length of consecutive matches/mismatches in addition to lengths of consecutive insertions and deletions as in the CPU-based implementation. This helps efficiently retrieve the backtracking matrix to obtain the optimal alignment in the second stage. CONCLUSIONS: Experimental results show that our alignment kernel with traceback is up to 80x and 14.14x faster than its CPU counterpart with synthetic datasets and real datasets, respectively. When integrated into GATK HC (alongside a GPU accelerated pair-HMMs forward kernel), the overall acceleration is 2.3x faster than the baseline GATK HC implementation, and 1.34x faster than the GATK HC implementation with the integrated GPU-based pair-HMMs forward algorithm. Although the methods proposed in this paper is to improve the performance of GATK HC, they can also be used in other pairwise alignments and applications.

Assuntos

Algoritmos , Gráficos por Computador , Variação Genética , Genoma Humano , Haplótipos , Alinhamento de Sequência/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA , Software

Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths.

Houtgast, Ernst Joachim; Sima, Vlad-Mihai; Bertels, Koen; Al-Ars, Zaid.

Comput Biol Chem ; 75: 54-64, 2018 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-29747076

RESUMO

We present our work on hardware accelerated genomics pipelines, using either FPGAs or GPUs to accelerate execution of BWA-MEM, a widely-used algorithm for genomic short read mapping. The mapping stage can take up to 40% of overall processing time for genomics pipelines. Our implementation offloads the Seed Extension function, one of the main BWA-MEM computational functions, onto an accelerator. Sequencers typically output reads with a length of 150 base pairs. However, read length is expected to increase in the near future. Here, we investigate the influence of read length on BWA-MEM performance using data sets with read length up to 400 base pairs, and introduce methods to ameliorate the impact of longer read length. For the industry-standard 150 base pair read length, our implementation achieves an up to two-fold increase in overall application-level performance for systems with at most twenty-two logical CPU cores. Longer read length requires commensurately bigger data structures, which directly impacts accelerator efficiency. The two-fold performance increase is sustained for read length of at most 250 base pairs. To improve performance, we perform a classification of the inefficiency of the underlying systolic array architecture. By eliminating idle regions as much as possible, efficiency is improved by up to +95%. Moreover, adaptive load balancing intelligently distributes work between host and accelerator to ensure use of an accelerator always results in performance improvement, which in GPU-constrained scenarios provides up to +45% more performance.

Assuntos

Algoritmos , Mapeamento Cromossômico , Genômica , Gráficos por Computador , Computadores

Efficient Acceleration of the Pair-HMMs Forward Algorithm for GATK HaplotypeCaller on Graphics Processing Units.

Ren, Shanshan; Bertels, Koen; Al-Ars, Zaid.

Evol Bioinform Online ; 14: 1176934318760543, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29568218

RESUMO

GATK HaplotypeCaller (HC) is a popular variant caller, which is widely used to identify variants in complex genomes. However, due to its high variants detection accuracy, it suffers from long execution time. In GATK HC, the pair-HMMs forward algorithm accounts for a large percentage of the total execution time. This article proposes to accelerate the pair-HMMs forward algorithm on graphics processing units (GPUs) to improve the performance of GATK HC. This article presents several GPU-based implementations of the pair-HMMs forward algorithm. It also analyzes the performance bottlenecks of the implementations on an NVIDIA Tesla K40 card with various data sets. Based on these results and the characteristics of GATK HC, we are able to identify the GPU-based implementations with the highest performance for the various analyzed data sets. Experimental results show that the GPU-based implementations of the pair-HMMs forward algorithm achieve a speedup of up to 5.47× over existing GPU-based implementations.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA