RESUMEN
DNA data storage is a potential alternative to magnetic tape for archival storage purposes, promising substantial gains in information density. Critical to the success of DNA as a storage media is an understanding of the role of environmental factors on the longevity of the stored information. In this paper, we evaluate the effect of exposure to ionizing particle radiation, a cause of data loss in traditional magnetic media, on the longevity of data in DNA data storage pools. We develop a mass action kinetics model to estimate the rate of damage accumulation in DNA strands due to neutron interactions with both nucleotides and residual water molecules, then utilize the model to evaluate the effect several design parameters of a typical DNA data storage scheme have on expected data longevity. Finally, we experimentally validate our model by exposing dried DNA samples to different levels of neutron irradiation and analyzing the resulting error profile. Our results show that particle radiation is not a significant contributor to data loss in DNA data storage pools under typical storage conditions.
Asunto(s)
ADN , ADN/efectos de la radiación , Neutrones/efectos adversos , Daño del ADN/efectos de la radiación , Almacenamiento y Recuperación de la Información/métodos , Radiación Ionizante , CinéticaRESUMEN
Synthetic DNA is an attractive medium for long-term data storage because of its density, ease of copying, sustainability, and longevity. Recent advances have focused on the development of new encoding algorithms, automation, preservation, and sequencing technologies. Despite progress in these areas, the most challenging hurdle in deployment of DNA data storage remains the write throughput, which limits data storage capacity. We have developed the first nanoscale DNA storage writer, which we expect to scale DNA write density to 25 × 106 sequences per square centimeter, three orders of magnitude improvement over existing DNA synthesis arrays. We show confinement of DNA synthesis to an area under 1 square micrometer, parallelized over millions of nanoelectrode wells and then successfully write and decode a message in DNA. DNA synthesis on this scale will enable write throughputs to reach megabytes per second and is a key enabler to a practical DNA data storage system.
RESUMEN
DNA has recently emerged as an attractive medium for archival data storage. Recent work has demonstrated proof-of-principle prototype systems; however, very uneven (biased) sequencing coverage has been reported, which indicates inefficiencies in the storage process. Deviations from the average coverage in the sequence copy distribution can either cause wasteful provisioning in sequencing or excessive number of missing sequences. Here, we use millions of unique sequences from a DNA-based digital data archival system to study the oligonucleotide copy unevenness problem and show that the two paramount sources of bias are the synthesis and amplification (PCR) processes. Based on these findings, we develop a statistical model for each molecular process as well as the overall process. We further use our model to explore the trade-offs between synthesis bias, storage physical density, logical redundancy, and sequencing redundancy, providing insights for engineering efficient, robust DNA data storage systems.
Asunto(s)
Almacenamiento y Recuperación de la Información , Análisis de Secuencia de ADN , Sesgo , Modelos Teóricos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/estadística & datos numéricosRESUMEN
DNA promises to be a high density data storage medium, but physical storage poses a challenge. To store large amounts of data, pools must be physically isolated so they can share the same addressing scheme. We propose the storage of dehydrated DNA spots on glass as an approach for scalable DNA data storage. The dried spots can then be retrieved by a water droplet using a digital microfluidic device. Here we show that this storage schema works with varying spot organization, spotted masses of DNA, and droplet retrieval dwell times. In all cases, the majority of the DNA was retrieved and successfully sequenced. We demonstrate that the spots can be densely arranged on a microfluidic device without significant contamination of the retrieval. We also demonstrate that 1 TB of data could be stored in a single spot of DNA and successfully retrieved using this method.
Asunto(s)
ADN/análisis , Almacenamiento y Recuperación de la Información/métodos , Dispositivos Laboratorio en un Chip , Microfluídica , Biología Computacional/métodos , Desecación , Diseño de Equipo , Vidrio , Agua/químicaRESUMEN
Synthetic DNA has emerged as a novel substrate to encode computer data with the potential to be orders of magnitude denser than contemporary cutting edge techniques. However, even with the help of automated synthesis and sequencing devices, many intermediate steps still require expert laboratory technicians to execute. We have developed an automated end-to-end DNA data storage device to explore the challenges of automation within the constraints of this unique application. Our device encodes data into a DNA sequence, which is then written to a DNA oligonucleotide using a custom DNA synthesizer, pooled for liquid storage, and read using a nanopore sequencer and a novel, minimal preparation protocol. We demonstrate an automated 5-byte write, store, and read cycle with a modular design enabling expansion as new technology becomes available.
Asunto(s)
Biología Computacional/tendencias , ADN/genética , Almacenamiento y Recuperación de la Información/tendencias , Automatización/métodos , Humanos , Análisis de Secuencia de ADNRESUMEN
Synthetic DNA is durable and can encode digital data with high density, making it an attractive medium for data storage. However, recovering stored data on a large-scale currently requires all the DNA in a pool to be sequenced, even if only a subset of the information needs to be extracted. Here, we encode and store 35 distinct files (over 200 MB of data), in more than 13 million DNA oligonucleotides, and show that we can recover each file individually and with no errors, using a random access approach. We design and validate a large library of primers that enable individual recovery of all files stored within the DNA. We also develop an algorithm that greatly reduces the sequencing read coverage required for error-free decoding by maximizing information from all sequence reads. These advances demonstrate a viable, large-scale system for DNA data storage and retrieval.