Pesquisa | Biblioteca Virtual em Saúde

ARYANA: Aligning Reads by Yet Another Approach.

Gholami, Milad; Arbabi, Aryan; Sharifi-Zarchi, Ali; Chitsaz, Hamidreza; Sadeghi, Mehdi.

BMC Bioinformatics ; 15 Suppl 9: S12, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25252881

RESUMO

MOTIVATION: Although there are many different algorithms and software tools for aligning sequencing reads, fast gapped sequence search is far from solved. Strong interest in fast alignment is best reflected in the $10(6) prize for the Innocentive competition on aligning a collection of reads to a given database of reference genomes. In addition, de novo assembly of next-generation sequencing long reads requires fast overlap-layout-concensus algorithms which depend on fast and accurate alignment. CONTRIBUTION: We introduce ARYANA, a fast gapped read aligner, developed on the base of BWA indexing infrastructure with a completely new alignment engine that makes it significantly faster than three other aligners: Bowtie2, BWA and SeqAlto, with comparable generality and accuracy. Instead of the time-consuming backtracking procedures for handling mismatches, ARYANA comes with the seed-and-extend algorithmic framework and a significantly improved efficiency by integrating novel algorithmic techniques including dynamic seed selection, bidirectional seed extension, reset-free hash tables, and gap-filling dynamic programming. As the read length increases ARYANA's superiority in terms of speed and alignment rate becomes more evident. This is in perfect harmony with the read length trend as the sequencing technologies evolve. The algorithmic platform of ARYANA makes it easy to develop mission-specific aligners for other applications using ARYANA engine. AVAILABILITY: ARYANA with complete source code can be obtained from http://github.com/aryana-aligner.

Assuntos

Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/economia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Alinhamento de Sequência/economia , Análise de Sequência de DNA/economia

De novo assembly of human genomes with massively parallel short read sequencing.

Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue; Qian, Wubin; Fang, Xiaodong; Shi, Zhongbin; Li, Yingrui; Li, Shengting; Shan, Gao; Kristiansen, Karsten; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun.

Genome Res ; 20(2): 265-72, 2010 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-20019144

RESUMO

Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.

Assuntos

Genoma Humano , Projeto Genoma Humano , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Povo Asiático/genética , População Negra/genética , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/economia , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Alinhamento de Sequência/economia , Análise de Sequência de DNA/economia

High-throughput sequence alignment using Graphics Processing Units.

Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh.

BMC Bioinformatics ; 8: 474, 2007 Dec 10.

Artigo em Inglês | MEDLINE | ID: mdl-18070356

RESUMO

BACKGROUND: The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. RESULTS: This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. CONCLUSION: MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

Assuntos

Gráficos por Computador/instrumentação , Sistemas de Gerenciamento de Base de Dados , Alinhamento de Sequência/economia , Alinhamento de Sequência/instrumentação , Animais , Bacillus anthracis/genética , Sequência de Bases , Caenorhabditis/genética , Gráficos por Computador/economia , Computadores/economia , Mapeamento de Sequências Contíguas/economia , Mapeamento de Sequências Contíguas/instrumentação , DNA/ultraestrutura , Bases de Dados Genéticas , Biblioteca Genômica , Listeria monocytogenes/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos , Streptococcus suis/genética , Fatores de Tempo , Simplificação do Trabalho

elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling.

Herzeel, Charlotte; Costanza, Pascal; Decap, Dries; Fostier, Jan; Reumers, Joke.

PLoS One ; 10(7): e0132868, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26182406

RESUMO

elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.

Assuntos

Algoritmos , Exoma , Genoma Humano , Alinhamento de Sequência/economia , Software , Benchmarking , Mapeamento de Sequências Contíguas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Polimorfismo de Nucleotídeo Único , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos

Improved gapped alignment in BLAST.

Cameron, Michael; Williams, Hugh E; Cannane, Adam.

IEEE/ACM Trans Comput Biol Bioinform ; 1(3): 116-29, 2004.

Artigo em Inglês | MEDLINE | ID: mdl-17048387

RESUMO

Homology search is a key tool for understanding the role, structure, and biochemical function of genomic sequences. The most popular technique for rapid homology search is BLAST, which has been in widespread use within universities, research centers, and commercial enterprises since the early 1990s. In this paper, we propose a new step in the BLAST algorithm to reduce the computational cost of searching with negligible effect on accuracy. This new step-semigapped alignment-compromises between the efficiency of ungapped alignment and the accuracy of gapped alignment, allowing BLAST to accurately filter sequences with lower computational cost. In addition, we propose a heuristic-restricted insertion alignment-that avoids unlikely evolutionary paths with the aim of reducing gapped alignment cost with negligible effect on accuracy. Together, after including an optimization of the local alignment recursion, our two techniques more than double the speed of the gapped alignment stages in BLAST. We conclude that our techniques are an important improvement to the BLAST algorithm. Source code for the alignment algorithms is available for download at http://www.bsg.rmit.edu.au/iga/.

Assuntos

Algoritmos , Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , Bases de Dados Genéticas , Internet , Dados de Sequência Molecular , Reprodutibilidade dos Testes , Alinhamento de Sequência/economia , Homologia de Sequência de Aminoácidos

A faster algorithm for simultaneous alignment and folding of RNA.

Ziv-Ukelson, Michal; Gat-Viks, Irit; Wexler, Ydo; Shamir, Ron.

J Comput Biol ; 17(8): 1051-65, 2010 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-20649420

RESUMO

The current pairwise RNA (secondary) structural alignment algorithms are based on Sankoff's dynamic programming algorithm from 1985. Sankoff's algorithm requires O(N(6)) time and O(N(4)) space, where N denotes the length of the compared sequences, and thus its applicability is very limited. The current literature offers many heuristics for speeding up Sankoff's alignment process, some making restrictive assumptions on the length or the shape of the RNA substructures. We show how to speed up Sankoff's algorithm in practice via non-heuristic methods, without compromising optimality. Our analysis shows that the expected time complexity of the new algorithm is O(N(4)sigma(N)), where sigma(N) converges to O(N), assuming a standard polymer folding model which was supported by experimental analysis. Hence, our algorithm speeds up Sankoff's algorithm by a linear factor on average. In simulations, our algorithm speeds up computation by a factor of 3-12 for sequences of length 25-250. Code and data sets are available, upon request.

Assuntos

Algoritmos , RNA/química , Alinhamento de Sequência/métodos , Animais , Sequência de Bases , Caenorhabditis elegans/genética , DNA/química , Conformação de Ácido Nucleico , Alinhamento de Sequência/economia

Multiple sequence alignment using simulated annealing.

Kim, J; Pramanik, S; Chung, M J.

Comput Appl Biosci ; 10(4): 419-26, 1994 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-7804875

RESUMO

Multiple sequence alignment is a useful technique for studying molecular evolution and analyzing structure-sequence relationships. Dynamic programming of multiple sequence alignment has been widely used to find an optimal alignment. However, dynamic programming does not allow for certain types of gap costs, and it limits the number of sequences that can be aligned due to its high computational complexity. The focus of this paper is to use simulated annealing as the basis for developing an efficient multiple sequence alignment algorithm. An algorithm called Multiple Sequence Alignment using Simulated Annealing (MSASA) has been developed. The computational complexity of MSASA is significantly reduced by replacing the high-temperature phase of the annealing process by a fast heuristic algorithm. This heuristic algorithm facilitates in minimizing the solution set of the low-temperature phase of the annealing process. Compared to the dynamic programming approach, MSASA can (i) use natural gap costs which can generate better solution, (ii) align more sequences and (iii) take less computation time.

Assuntos

Algoritmos , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , Animais , Custos e Análise de Custo , Estudos de Avaliação como Assunto , Humanos , Dados de Sequência Molecular , Proteínas/genética , Alinhamento de Sequência/economia , Alinhamento de Sequência/estatística & dados numéricos , Homologia de Sequência de Aminoácidos , Software , Fatores de Tempo

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA