Pesquisa | BVS Economia da Saúde

High-throughput sequence alignment using Graphics Processing Units.

Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh.

BMC Bioinformatics ; 8: 474, 2007 Dec 10.

Artigo em Inglês | MEDLINE | ID: mdl-18070356

RESUMO

BACKGROUND: The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. RESULTS: This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. CONCLUSION: MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

Assuntos

Gráficos por Computador/instrumentação , Sistemas de Gerenciamento de Base de Dados , Alinhamento de Sequência/economia , Alinhamento de Sequência/instrumentação , Animais , Bacillus anthracis/genética , Sequência de Bases , Caenorhabditis/genética , Gráficos por Computador/economia , Computadores/economia , Mapeamento de Sequências Contíguas/economia , Mapeamento de Sequências Contíguas/instrumentação , DNA/ultraestrutura , Bases de Dados Genéticas , Biblioteca Genômica , Listeria monocytogenes/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos , Streptococcus suis/genética , Fatores de Tempo , Simplificação do Trabalho

Efficient decoding algorithms for generalized hidden Markov model gene finders.

Majoros, William H; Pertea, Mihaela; Delcher, Arthur L; Salzberg, Steven L.

BMC Bioinformatics ; 6: 16, 2005 Jan 24.

Artigo em Inglês | MEDLINE | ID: mdl-15667658

RESUMO

BACKGROUND: The Generalized Hidden Markov Model (GHMM) has proven a useful framework for the task of computational gene prediction in eukaryotic genomes, due to its flexibility and probabilistic underpinnings. As the focus of the gene finding community shifts toward the use of homology information to improve prediction accuracy, extensions to the basic GHMM model are being explored as possible ways to integrate this homology information into the prediction process. Particularly prominent among these extensions are those techniques which call for the simultaneous prediction of genes in two or more genomes at once, thereby increasing significantly the computational cost of prediction and highlighting the importance of speed and memory efficiency in the implementation of the underlying GHMM algorithms. Unfortunately, the task of implementing an efficient GHMM-based gene finder is already a nontrivial one, and it can be expected that this task will only grow more onerous as our models increase in complexity. RESULTS: As a first step toward addressing the implementation challenges of these next-generation systems, we describe in detail two software architectures for GHMM-based gene finders, one comprising the common array-based approach, and the other a highly optimized algorithm which requires significantly less memory while achieving virtually identical speed. We then show how both of these architectures can be accelerated by a factor of two by optimizing their content sensors. We finish with a brief illustration of the impact these optimizations have had on the feasibility of our new homology-based gene finder, TWAIN. CONCLUSIONS: In describing a number of optimizations for GHMM-based gene finders and making available two complete open-source software systems embodying these methods, it is our hope that others will be more enabled to explore promising extensions to the GHMM framework, thereby improving the state-of-the-art in gene prediction techniques.

Assuntos

Biologia Computacional/métodos , Regulação da Expressão Gênica , Algoritmos , Teorema de Bayes , Simulação por Computador , DNA/química , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Genoma , Genoma Humano , Humanos , Funções Verossimilhança , Cadeias de Markov , Modelos Biológicos , Modelos Genéticos , Modelos Estatísticos , Probabilidade , Linguagens de Programação , Alinhamento de Sequência , Análise de Sequência de DNA , Software

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA