RESUMEN
Synthetic DNA has been proposed as a storage medium for digital information due to its high theoretical storage density and anticipated long storage horizons. However, under all ambient storage conditions, DNA undergoes a slow chemical decay process resulting in nicked (broken) DNA strands, and the information stored in these strands is no longer readable. In this work we design an enzymatic repair procedure, which is applicable to the DNA pool prior to readout and can partially reverse the damage. Through a chemical understanding of the decay process, an overhang at the 3' end of the damaged site is identified as obstructive to repair via the base excision-repair (BER) mechanism. The obstruction can be removed via the enzyme apurinic/apyrimidinic endonuclease I (APE1), thereby enabling repair of hydrolytically damaged DNA via Bst polymerase and Taq ligase. Simulations of damage and repair reveal the benefit of the enzymatic repair step for DNA data storage, especially when data is stored in DNA at high storage densities (=low physical redundancy) and for long time durations.
Asunto(s)
Reparación del ADN , ADN-(Sitio Apurínico o Apirimidínico) Liasa , ADN-(Sitio Apurínico o Apirimidínico) Liasa/genética , ADN-(Sitio Apurínico o Apirimidínico) Liasa/metabolismo , ADN/genética , Almacenamiento y Recuperación de la Información , Desoxirribonucleasa I , LigasasRESUMEN
Synthetic DNA has recently risen as a viable alternative for long-term digital data storage. To ensure that information is safely recovered after storage, it is essential to appropriately preserve the physical DNA molecules encoding the data. While preservation of biological DNA has been studied previously, synthetic DNA differs in that it is typically much shorter in length, it has different sequence profiles with fewer, if any, repeats (or homopolymers), and it has different contaminants. In this paper, nine different methods used to preserve data files encoded in synthetic DNA are evaluated by accelerated aging of nearly 29 000 DNA sequences. In addition to a molecular count comparison, the DNA is also sequenced and analyzed after aging. These findings show that errors and erasures are stochastic and show no practical distribution difference between preservation methods. Finally, the physical density of these methods is compared and a stability versus density trade-offs discussion provided.
Asunto(s)
ADN/química , Secuencia de Bases , ADN/metabolismo , Semivida , Secuenciación de Nucleótidos de Alto Rendimiento , Nanopartículas de Magnetita/química , Reacción en Cadena de la Polimerasa , Análisis de Secuencia de ADN , Temperatura , Factores de Tiempo , Trehalosa/químicaRESUMEN
Because of its longevity and enormous information density, DNA is considered a promising data storage medium. In this work, we provide instructions for archiving digital information in the form of DNA and for subsequently retrieving it from the DNA. In principle, information can be represented in DNA by simply mapping the digital information to DNA and synthesizing it. However, imperfections in synthesis, sequencing, storage and handling of the DNA induce errors within the molecules, making error-free information storage challenging. The procedure discussed here enables error-free storage by protecting the information using error-correcting codes. Specifically, in this protocol, we provide the technical details and precise instructions for translating digital information to DNA sequences, physically handling the biomolecules, storing them and subsequently re-obtaining the information by sequencing the DNA. Along with the protocol, we provide computer code that automatically encodes digital information to DNA sequences and decodes the information back from DNA to a digital file. The required software is provided on a Github repository. The protocol relies on commercial DNA synthesis and DNA sequencing via Illumina dye sequencing, and requires 1-2 h of preparation time, 1/2 d for sequencing preparation and 2-4 h for data analysis. This protocol focuses on storage scales of ~100 kB to 15 MB, offering an ideal starting point for small experiments. It can be augmented to enable higher data volumes and random access to the data and also allows for future sequencing and synthesis technologies, by changing the parameters of the encoder/decoder to account for the corresponding error rates.
Asunto(s)
ADN/genética , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , ADN/química , Modelos Moleculares , Conformación de Ácido NucleicoRESUMEN
Rapid aging tests (70 °C, 50% RH) of solid state DNA dried in the presence of various salt formulations, showed the strong stabilizing effect of calcium phosphate, calcium chloride and magnesium chloride, even at high DNA loadings (>20 wt%). A DNA-based digital information storage system utilizing the stabilizing effect of MgCl2 was tested by storing a DNA file, encoding 115 kB of digital data, and the successful readout of the file by sequencing after accelerated aging.
Asunto(s)
Cloruro de Calcio/química , Fosfatos de Calcio/química , ADN/química , Almacenamiento y Recuperación de la Información , Cloruro de Magnesio/química , ADN/síntesis química , Tamaño de la Partícula , Sales (Química)/química , Propiedades de SuperficieRESUMEN
For many manufacturing processes, correct mixing compositions are crucial to guarantee product quality. However, the analysis of mixing ratios based on component balances can be challenging and requires extensive infrastructure. DNA barcodes have been previously proposed as low-cost markers for product authenticity, and we show here that the quantification of such barcodes via a quantitative real-time polymerase chain reaction (PCR) enables the determination of mixing ratios in a range of liquid and polymeric products. To enable the distribution of the DNA within the various matrixes, the biochemical is encapsulated in silica nanoparticles and distributed within the matrix of the raw material. If both raw materials of a two-component mixture contain such barcodes, the composition of the mixture can be determined from the relative concentration of the barcodes via multiplex PCR reactions, irrespective of the sampling volume and for a wide range of initial barcode concentrations (10 ppm to 10 ppb). As an application example, we use the barcodes to determine the mixing ratios of cross-linked and multicomponent polysilicon products.