Búsqueda | BVS Bolivia

DNA assembly for nanopore data storage readout.

Lopez, Randolph; Chen, Yuan-Jyue; Dumas Ang, Siena; Yekhanin, Sergey; Makarychev, Konstantin; Racz, Miklos Z; Seelig, Georg; Strauss, Karin; Ceze, Luis.

Nat Commun ; 10(1): 2933, 2019 07 03.

Artículo en Inglés | MEDLINE | ID: mdl-31270330

RESUMEN

Synthetic DNA is becoming an attractive substrate for digital data storage due to its density, durability, and relevance in biological research. A major challenge in making DNA data storage a reality is that reading DNA back into data using sequencing by synthesis remains a laborious, slow and expensive process. Here, we demonstrate successful decoding of 1.67 megabytes of information stored in short fragments of synthetic DNA using a portable nanopore sequencing platform. We design and validate an assembly strategy for DNA storage that drastically increases the throughput of nanopore sequencing. Importantly, this assembly strategy is generalizable to any application that requires nanopore sequencing of small DNA amplicons.

Asunto(s)

ADN/genética , Almacenamiento y Recuperación de la Información/métodos , ADN/síntesis química , Bases de Datos Genéticas , Nanoporos , Nanotecnología , Análisis de Secuencia de ADN/instrumentación

Erratum: Random access in large-scale DNA data storage.

Organick, Lee; Ang, Siena Dumas; Chen, Yuan-Jyue; Lopez, Randolph; Yekhanin, Sergey; Makarychev, Konstantin; Racz, Miklos Z; Kamath, Govinda; Gopalan, Parikshit; Nguyen, Bichlien; Takahashi, Christopher N; Newman, Sharon; Parker, Hsing-Yeh; Rashtchian, Cyrus; Stewart, Kendall; Gupta, Gagan; Carlson, Robert; Mulligan, John; Carmean, Douglas; Seelig, Georg; Ceze, Luis; Strauss, Karin.

Nat Biotechnol ; 36(7): 660, 2018 07 06.

Artículo en Inglés | MEDLINE | ID: mdl-29979658

Random access in large-scale DNA data storage.

Nat Biotechnol ; 36(3): 242-248, 2018 03.

Artículo en Inglés | MEDLINE | ID: mdl-29457795

RESUMEN

Synthetic DNA is durable and can encode digital data with high density, making it an attractive medium for data storage. However, recovering stored data on a large-scale currently requires all the DNA in a pool to be sequenced, even if only a subset of the information needs to be extracted. Here, we encode and store 35 distinct files (over 200 MB of data), in more than 13 million DNA oligonucleotides, and show that we can recover each file individually and with no errors, using a random access approach. We design and validate a large library of primers that enable individual recovery of all files stored within the DNA. We also develop an algorithm that greatly reduces the sequencing read coverage required for error-free decoding by maximizing information from all sequence reads. These advances demonstrate a viable, large-scale system for DNA data storage and retrieval.

Asunto(s)

ADN/genética , Almacenamiento y Recuperación de la Información , Análisis de Secuencia de ADN/métodos , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento

Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias.

Annu Int Conf IEEE Eng Med Biol Soc ; 2011: 924-7, 2011.

Artículo en Inglés | MEDLINE | ID: mdl-22254462

RESUMEN

The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

Asunto(s)

Algoritmos , Computadores de Gran Porte , ADN Bacteriano/genética , Genoma Bacteriano/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Secuencia de Bases , Datos de Secuencia Molecular , Diseño de Software

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA