Pesquisa | Portal de Pesquisa da BVS Enfermagem

LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.

El-Metwally, Sara; Zakaria, Magdi; Hamza, Taher.

Bioinformatics ; 32(21): 3215-3223, 2016 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-27412092

RESUMO

MOTIVATION: The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory. RESULTS: LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage. AVAILABILITY AND IMPLEMENTATION: https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Genoma , Genômica , Humanos , Análise de Sequência de DNA

Next-generation sequence assembly: four stages of data processing and computational challenges.

El-Metwally, Sara; Hamza, Taher; Zakaria, Magdi; Helmy, Mohamed.

PLoS Comput Biol ; 9(12): e1003345, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24348224

RESUMO

Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.

Assuntos

DNA/química , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , Genoma , Alinhamento de Sequência , Software

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA