Pesquisa | Portal Regional da BVS

FastEtch: A Fast Sketch-Based Assembler for Genomes.

Ghosh, Priyanka; Kalyanaraman, Ananth.

IEEE/ACM Trans Comput Biol Bioinform ; 16(4): 1091-1106, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-28910776

RESUMO

De novo genome assembly describes the process of reconstructing an unknown genome from a large collection of short (or long) reads sequenced from the genome. A single run of a Next-Generation Sequencing (NGS) technology can produce billions of short reads, making genome assembly computationally demanding (both in terms of memory and time). One of the major computational steps in modern day short read assemblers involves the construction and use of a string data structure called the de Bruijn graph. In fact, a majority of short read assemblers build the complete de Bruijn graph for the set of input reads, and subsequently traverse and prune low-quality edges, in order to generate genomic "contigs"-the output of assembly. These steps of graph construction and traversal, contribute to well over 90 percent of the runtime and memory. In this paper, we present a fast algorithm, FastEtch, that uses sketching to build an approximate version of the de Bruijn graph for the purpose of generating an assembly. The algorithm uses Count-Min sketch, which is a probabilistic data structure for streaming data sets. The result is an approximate de Bruijn graph that stores information pertaining only to a selected subset of nodes that are most likely to contribute to the contig generation step. In addition, edges are not stored; instead that fraction which contribute to our contig generation are detected on-the-fly. This approximate approach is intended to significantly improve performance (both execution time and memory footprint) whilst possibly compromising on the output assembly quality. We present two main versions of the assembler-one that generates an assembly, where each contig represents a contiguous genomic region from one strand of the DNA, and another that generates an assembly, where the contigs can straddle either of the two strands of the DNA. For further scalability, we have implemented a multi-threaded parallel code. Experimental results using our algorithm conducted on E. coli, Yeast, C. elegans, and Human (Chr2 and Chr2+3) genomes show that our method yields one of the best time-memory-quality trade-offs, when compared against many state-of-the-art genome assemblers.

Assuntos

Biologia Computacional/instrumentação , Mapeamento de Sequências Contíguas/instrumentação , Genoma , Software , Algoritmos , Animais , Caenorhabditis elegans/genética , Biologia Computacional/métodos , Mapeamento de Sequências Contíguas/métodos , Escherichia coli/genética , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Probabilidade , Leveduras/genética

High-throughput sequence alignment using Graphics Processing Units.

Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh.

BMC Bioinformatics ; 8: 474, 2007 Dec 10.

Artigo em Inglês | MEDLINE | ID: mdl-18070356

RESUMO

BACKGROUND: The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. RESULTS: This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. CONCLUSION: MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

Assuntos

Gráficos por Computador/instrumentação , Sistemas de Gerenciamento de Base de Dados , Alinhamento de Sequência/economia , Alinhamento de Sequência/instrumentação , Animais , Bacillus anthracis/genética , Sequência de Bases , Caenorhabditis/genética , Gráficos por Computador/economia , Computadores/economia , Mapeamento de Sequências Contíguas/economia , Mapeamento de Sequências Contíguas/instrumentação , DNA/ultraestrutura , Bases de Dados Genéticas , Biblioteca Genômica , Listeria monocytogenes/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos , Streptococcus suis/genética , Fatores de Tempo , Simplificação do Trabalho

AutoFACT: an automatic functional annotation and classification tool.

Koski, Liisa B; Gray, Michael W; Lang, B Franz; Burger, Gertraud.

BMC Bioinformatics ; 6: 151, 2005 Jun 16.

Artigo em Inglês | MEDLINE | ID: mdl-15960857

RESUMO

BACKGROUND: Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. RESULTS: We present AutoFACT, a fully automated and customizable annotation tool that assigns biologically informative functions to a sequence. Key features of this tool are that it (1) analyzes nucleotide and protein sequence data; (2) determines the most informative functional description by combining multiple BLAST reports from several user-selected databases; (3) assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names; and (4) generates output in HTML, text and GFF formats for the user's convenience. We have compared AutoFACT to four well-established annotation pipelines. The error rate of functional annotation is estimated to be only between 1-2%. Comparison of AutoFACT to the traditional top-BLAST-hit annotation method shows that our procedure increases the number of functionally informative annotations by approximately 50%. CONCLUSION: AutoFACT will serve as a useful annotation tool for smaller sequencing groups lacking dedicated bioinformatics staff. It is implemented in PERL and runs on LINUX/UNIX platforms. AutoFACT is available at http://megasun.bch.umontreal.ca/Software/AutoFACT.htm.

Assuntos

Etiquetas de Sequências Expressas , Gestão da Informação/métodos , Análise de Sequência de DNA/métodos , Software , Acanthamoeba castellanii/classificação , Acanthamoeba castellanii/genética , Animais , Biologia Computacional/métodos , Mapeamento de Sequências Contíguas/instrumentação , Mapeamento de Sequências Contíguas/métodos , DNA Complementar/análise , Bases de Dados Genéticas , Humanos , Internet , Filogenia , Plasmodium falciparum/classificação , Plasmodium falciparum/genética , Rickettsia prowazekii/classificação , Rickettsia prowazekii/genética , Saccharomyces cerevisiae/classificação , Saccharomyces cerevisiae/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/instrumentação , Validação de Programas de Computador

Selection of oligonucleotide probes for protein coding sequences.

Wang, Xiaowei; Seed, Brian.

Bioinformatics ; 19(7): 796-802, 2003 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-12724288

RESUMO

MOTIVATION: Large arrays of oligonucleotide probes have become popular tools for analyzing RNA expression. However to date most oligo collections contain poorly validated sequences or are biased toward untranslated regions (UTRs). Here we present a strategy for picking oligos for microarrays that focus on a design universe consisting exclusively of protein coding regions. We describe the constraints in oligo design that are imposed by this strategy, as well as a software tool that allows the strategy to be applied broadly. RESULT: In this work we sequentially apply a variety of simple filters to candidate sequences for oligo probes. The primary filter is a rejection of probes that contain contiguous identity with any other sequence in the sample universe that exceeds a pre-established threshold length. We find that rejection of oligos that contain 15 bases of perfect match with other sequences in the design universe is a feasible strategy for oligo selection for probe arrays designed to interrogate mammalian RNA populations. Filters to remove sequences with low complexity and predicted poor probe accessibility narrow the candidate probe space only slightly. Rejection based on global sequence alignment is performed as a secondary, rather than primary, test, leading to an algorithm that is computationally efficient. Splice isoforms pose unique challenges and we find that isoform prevalence will for the most part have to be determined by analysis of the patterns of hybridization of partially redundant oligonucleotides. AVAILABILITY: The oligo design program OligoPicker and its source code are freely available at our website.

Assuntos

Sondas de DNA , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Animais , Análise por Conglomerados , Mapeamento de Sequências Contíguas/instrumentação , Mapeamento de Sequências Contíguas/métodos , Genoma Humano , Humanos , Camundongos , Análise de Sequência com Séries de Oligonucleotídeos/instrumentação , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de DNA

MerMade: an oligodeoxyribonucleotide synthesizer for high throughput oligonucleotide production in dual 96-well plates.

Rayner, S; Brignac, S; Bumeister, R; Belosludtsev, Y; Ward, T; Grant, O; O'Brien, K; Evans, G A; Garner, H R.

Genome Res ; 8(7): 741-7, 1998 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-9685322

RESUMO

We have designed and constructed a machine that synthesizes two standard 96-well plates of oligonucleotides in a single run using standard phosphoramidite chemistry. The machine is capable of making a combination of standard, degenerate, or modified oligos in a single plate. The run time is typically 17 hr for two plates of 20-mers and a reaction scale of 40 nM. The reaction vessel is a standard polypropylene 96-well plate with a hole drilled in the bottom of each well. The two plates are placed in separate vacuum chucks and mounted on an xy table. Each well in turn is positioned under the appropriate reagent injection line and the reagent is injected by switching a dedicated valve. All aspects of machine operation are controlled by a Macintosh computer, which also guides the user through the startup and shutdown procedures, provides a continuous update on the status of the run, and facilitates a number of service procedures that need to be carried out periodically. Over 25,000 oligos have been synthesized for use in dye terminator sequencing reactions, polymerase chain reactions (PCRs), hybridization, and RT-PCR. Oligos up to 100 bases in length have been made with a coupling efficiency in excess of 99%. These machines, working in conjunction with our oligo prediction code are particularly well suited to application in automated high throughput genomic sequencing.

Assuntos

Mapeamento Cromossômico/instrumentação , Oligodesoxirribonucleotídeos/síntese química , Mapeamento Cromossômico/métodos , Mapeamento de Sequências Contíguas/instrumentação , Mapeamento de Sequências Contíguas/métodos , Cosmídeos , Reação em Cadeia da Polimerase/instrumentação , Reação em Cadeia da Polimerase/métodos , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos , Software

A distributed environment for physical map construction.

Grigoriev, A; Levin, A; Lehrach, H.

Bioinformatics ; 14(3): 252-8, 1998.

Artigo em Inglês | MEDLINE | ID: mdl-9614268

RESUMO

MOTIVATION: With the main focus of the Human Genome Project shifting to sequencing, bioinformatics support for constructing large-scale genomic maps of other organisms is still required. We attempt to provide for this with our work, aimed at the delivery of robust and user-friendly contig-building software on the WWW. RESULTS: We present a prototype distributed analytical environment for molecular biologists working in the area of genomic mapping. It consists of the WWW server for constructing contigs from users' data with a hypertext output connected to Java-based map visualization software. AVAILABILITY: Freely available on http://www.mpimg-berlin-dahlem.mpg. de/ approximately andy/server/ CONTACT: andy@rag3.rz-berlin.mpg.de

Assuntos

Mapeamento Cromossômico/instrumentação , Mapeamento Cromossômico/métodos , Redes de Comunicação de Computadores , Sistemas Computacionais , Algoritmos , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Redes de Comunicação de Computadores/instrumentação , Mapeamento de Sequências Contíguas/instrumentação , Mapeamento de Sequências Contíguas/métodos , Sistemas de Gerenciamento de Base de Dados/instrumentação , Bases de Dados Factuais , Internet , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA