Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
ArXiv ; 2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38827449

RESUMEN

Although deep learning (DL) methods are powerful for solving inverse problems, their reliance on high-quality training data is a major hurdle. This is significant in high-dimensional (dynamic/volumetric) magnetic resonance imaging (MRI), where acquisition of high-resolution fully sampled k-space data is impractical. We introduce a novel mathematical framework, dubbed k-band, that enables training DL models using only partial, limited-resolution k-space data. Specifically, we introduce training with stochastic gradient descent (SGD) over k-space subsets. In each training iteration, rather than using the fully sampled k-space for computing gradients, we use only a small k-space portion. This concept is compatible with different sampling strategies; here we demonstrate the method for k-space "bands", which have limited resolution in one dimension and can hence be acquired rapidly. We prove analytically that our method stochastically approximates the gradients computed in a fully-supervised setup, when two simple conditions are met: (i) the limited-resolution axis is chosen randomly-uniformly for every new scan, hence k-space is fully covered across the entire training set, and (ii) the loss function is weighed with a mask, derived here analytically, which facilitates accurate reconstruction of high-resolution details. Numerical experiments with raw MRI data indicate that k-band outperforms two other methods trained on limited-resolution data and performs comparably to state-of-the-art (SoTA) methods trained on high-resolution data. k-band hence obtains SoTA performance, with the advantage of training using only limited-resolution data. This work hence introduces a practical, easy-to-implement, self-supervised training framework, which involves fast acquisition and self-supervised reconstruction and offers theoretical guarantees.

2.
Nat Commun ; 15(1): 2955, 2024 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-38580696

RESUMEN

Physical unclonable functions (PUFs) based on unique tokens generated by random manufacturing processes have been proposed as an alternative to mathematical one-way algorithms. However, these tokens are not distributable, which is a disadvantage for decentralized applications. Finding unclonable, yet distributable functions would help bridge this gap and expand the applications of object-bound cryptography. Here we show that large random DNA pools with a segmented structure of alternating constant and randomly generated portions are able to calculate distinct outputs from millions of inputs in a specific and reproducible manner, in analogy to physical unclonable functions. Our experimental data with pools comprising up to >1010 unique sequences and encompassing >750 comparisons of resulting outputs demonstrate that the proposed chemical unclonable function (CUF) system is robust, distributable, and scalable. Based on this proof of concept, CUF-based anti-counterfeiting systems, non-fungible objects and decentralized multi-user authentication are conceivable.


Asunto(s)
Algoritmos , Comercio , ADN , Relación Estructura-Actividad
3.
Nat Commun ; 14(1): 6026, 2023 09 27.
Artículo en Inglés | MEDLINE | ID: mdl-37758710

RESUMEN

Archiving data in synthetic DNA offers unprecedented storage density and longevity. Handling and storage introduce errors and biases into DNA-based storage systems, necessitating the use of Error Correction Coding (ECC) which comes at the cost of added redundancy. However, insufficient data on these errors and biases, as well as a lack of modeling tools, limit data-driven ECC development and experimental design. In this study, we present a comprehensive characterisation of the error sources and biases present in the most common DNA data storage workflows, including commercial DNA synthesis, PCR, decay by accelerated aging, and sequencing-by-synthesis. Using the data from 40 sequencing experiments, we build a digital twin of the DNA data storage process, capable of simulating state-of-the-art workflows and reproducing their experimental results. We showcase the digital twin's ability to replace experiments and rationalize the design of redundancy in two case studies, highlighting opportunities for tangible cost savings and data-driven ECC development.


Asunto(s)
Replicación del ADN , ADN , ADN/genética , Sesgo , Longevidad
4.
ACS Nano ; 16(11): 17552-17571, 2022 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-36256971

RESUMEN

With the total amount of worldwide data skyrocketing, the global data storage demand is predicted to grow to 1.75 × 1014 GB by 2025. Traditional storage methods have difficulties keeping pace given that current storage media have a maximum density of 103 GB/mm3. As such, data production will far exceed the capacity of currently available storage methods. The costs of maintaining and transferring data, as well as the limited lifespans and significant data losses associated with current technologies also demand advanced solutions for information storage. Nature offers a powerful alternative through the storage of information that defines living organisms in unique orders of four bases (A, T, C, G) located in molecules called deoxyribonucleic acid (DNA). DNA molecules as information carriers have many advantages over traditional storage media. Their high storage density, potentially low maintenance cost, ease of synthesis, and chemical modification make them an ideal alternative for information storage. To this end, rapid progress has been made over the past decade by exploiting user-defined DNA materials to encode information. In this review, we discuss the most recent advances of DNA-based data storage with a major focus on the challenges that remain in this promising field, including the current intrinsic low speed in data writing and reading and the high cost per byte stored. Alternatively, data storage relying on DNA nanostructures (as opposed to DNA sequence) as well as on other combinations of nanomaterials and biomolecules are proposed with promising technological and economic advantages. In summarizing the advances that have been made and underlining the challenges that remain, we provide a roadmap for the ongoing research in this rapidly growing field, which will enable the development of technological solutions to the global demand for superior storage methodologies.


Asunto(s)
ADN , Almacenamiento y Recuperación de la Información , Análisis de Secuencia de ADN/métodos , ADN/química
5.
Commun Biol ; 5(1): 1117, 2022 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-36266439

RESUMEN

Synthetic DNA has been proposed as a storage medium for digital information due to its high theoretical storage density and anticipated long storage horizons. However, under all ambient storage conditions, DNA undergoes a slow chemical decay process resulting in nicked (broken) DNA strands, and the information stored in these strands is no longer readable. In this work we design an enzymatic repair procedure, which is applicable to the DNA pool prior to readout and can partially reverse the damage. Through a chemical understanding of the decay process, an overhang at the 3' end of the damaged site is identified as obstructive to repair via the base excision-repair (BER) mechanism. The obstruction can be removed via the enzyme apurinic/apyrimidinic endonuclease I (APE1), thereby enabling repair of hydrolytically damaged DNA via Bst polymerase and Taq ligase. Simulations of damage and repair reveal the benefit of the enzymatic repair step for DNA data storage, especially when data is stored in DNA at high storage densities (=low physical redundancy) and for long time durations.


Asunto(s)
Reparación del ADN , ADN-(Sitio Apurínico o Apirimidínico) Liasa , ADN-(Sitio Apurínico o Apirimidínico) Liasa/genética , ADN-(Sitio Apurínico o Apirimidínico) Liasa/metabolismo , ADN/genética , Almacenamiento y Recuperación de la Información , Desoxirribonucleasa I , Ligasas
6.
Nat Commun ; 11(1): 5869, 2020 11 18.
Artículo en Inglés | MEDLINE | ID: mdl-33208744

RESUMEN

The volume of securely encrypted data transmission required by today's network complexity of people, transactions and interactions increases continuously. To guarantee security of encryption and decryption schemes for exchanging sensitive information, large volumes of true random numbers are required. Here we present a method to exploit the stochastic nature of chemistry by synthesizing DNA strands composed of random nucleotides. We compare three commercial random DNA syntheses giving a measure for robustness and synthesis distribution of nucleotides and show that using DNA for random number generation, we can obtain 7 million GB of randomness from one synthesis run, which can be read out using state-of-the-art sequencing technologies at rates of ca. 300 kB/s. Using the von Neumann algorithm for data compression, we remove bias introduced from human or technological sources and assess randomness using NIST's statistical test suite.


Asunto(s)
ADN/síntesis química , Algoritmos , Secuencia de Bases , ADN/genética , Humanos , Análisis de Secuencia de ADN
7.
Nat Commun ; 11(1): 5345, 2020 10 22.
Artículo en Inglés | MEDLINE | ID: mdl-33093494

RESUMEN

Due to its longevity and enormous information density, DNA is an attractive medium for archival storage. The current hamstring of DNA data storage systems-both in cost and speed-is synthesis. The key idea for breaking this bottleneck pursued in this work is to move beyond the low-error and expensive synthesis employed almost exclusively in today's systems, towards cheaper, potentially faster, but high-error synthesis technologies. Here, we demonstrate a DNA storage system that relies on massively parallel light-directed synthesis, which is considerably cheaper than conventional solid-phase synthesis. However, this technology has a high sequence error rate when optimized for speed. We demonstrate that even in this high-error regime, reliable storage of information is possible, by developing a pipeline of algorithms for encoding and reconstruction of the information. In our experiments, we store a file containing sheet music of Mozart, and show perfect data recovery from low synthesis fidelity DNA.


Asunto(s)
Técnicas de Química Sintética/métodos , ADN/síntesis química , Almacenamiento y Recuperación de la Información/métodos , Algoritmos , Secuencia de Bases , ADN/química , ADN/genética , Biblioteca de Genes , Luz , Método de Montecarlo , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Procesos Fotoquímicos , Análisis de Secuencia de ADN
8.
Angew Chem Int Ed Engl ; 59(22): 8476-8480, 2020 05 25.
Artículo en Inglés | MEDLINE | ID: mdl-32083389

RESUMEN

Today, we can read human genomes and store digital data robustly in synthetic DNA. Herein, we report a strategy to intertwine these two technologies to enable the secure storage of valuable information in synthetic DNA, protected with personalized keys. We show that genetic short tandem repeats (STRs) contain sufficient entropy to generate strong encryption keys, and that only one technology, DNA sequencing, is required to simultaneously read the key and the data. Using this approach, we experimentally generated 80 bit strong keys from human DNA, and used such a key to encrypt 17 kB of digital information stored in synthetic DNA. Finally, the decrypted information was recovered perfectly from a single massively parallel sequencing run.


Asunto(s)
Seguridad Computacional , ADN/genética , Genómica , Almacenamiento y Recuperación de la Información/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Repeticiones de Microsatélite/genética
9.
Nat Protoc ; 15(1): 86-101, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31784718

RESUMEN

Because of its longevity and enormous information density, DNA is considered a promising data storage medium. In this work, we provide instructions for archiving digital information in the form of DNA and for subsequently retrieving it from the DNA. In principle, information can be represented in DNA by simply mapping the digital information to DNA and synthesizing it. However, imperfections in synthesis, sequencing, storage and handling of the DNA induce errors within the molecules, making error-free information storage challenging. The procedure discussed here enables error-free storage by protecting the information using error-correcting codes. Specifically, in this protocol, we provide the technical details and precise instructions for translating digital information to DNA sequences, physically handling the biomolecules, storing them and subsequently re-obtaining the information by sequencing the DNA. Along with the protocol, we provide computer code that automatically encodes digital information to DNA sequences and decodes the information back from DNA to a digital file. The required software is provided on a Github repository. The protocol relies on commercial DNA synthesis and DNA sequencing via Illumina dye sequencing, and requires 1-2 h of preparation time, 1/2 d for sequencing preparation and 2-4 h for data analysis. This protocol focuses on storage scales of ~100 kB to 15 MB, offering an ideal starting point for small experiments. It can be augmented to enable higher data volumes and random access to the data and also allows for future sequencing and synthesis technologies, by changing the parameters of the encoder/decoder to account for the corresponding error rates.


Asunto(s)
ADN/genética , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , ADN/química , Modelos Moleculares , Conformación de Ácido Nucleico
10.
Sci Rep ; 9(1): 9663, 2019 07 04.
Artículo en Inglés | MEDLINE | ID: mdl-31273225

RESUMEN

Owing to its longevity and enormous information density, DNA, the molecule encoding biological information, has emerged as a promising archival storage medium. However, due to technological constraints, data can only be written onto many short DNA molecules that are stored in an unordered way, and can only be read by sampling from this DNA pool. Moreover, imperfections in writing (synthesis), reading (sequencing), storage, and handling of the DNA, in particular amplification via PCR, lead to a loss of DNA molecules and induce errors within the molecules. In order to design DNA storage systems, a qualitative and quantitative understanding of the errors and the loss of molecules is crucial. In this paper, we characterize those error probabilities by analyzing data from our own experiments as well as from experiments of two different groups. We find that errors within molecules are mainly due to synthesis and sequencing, while imperfections in handling and storage lead to a significant loss of sequences. The aim of our study is to help guide the design of future DNA data storage systems by providing a quantitative and qualitative understanding of the DNA data storage channel.


Asunto(s)
Algoritmos , ADN/análisis , ADN/genética , Pruebas Diagnósticas de Rutina/normas , Almacenamiento y Recuperación de la Información/normas , Análisis de Secuencia de ADN/métodos , Manejo de Especímenes/normas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Proyectos de Investigación/normas
11.
12.
ACS Nano ; 9(10): 9564-72, 2015 Oct 27.
Artículo en Inglés | MEDLINE | ID: mdl-26258812

RESUMEN

The concentrations of nanoparticles present in colloidal dispersions are usually measured and given in mass concentration (e.g. mg/mL), and number concentrations can only be obtained by making assumptions about nanoparticle size and morphology. Additionally traditional nanoparticle concentration measures are not very sensitive, and only the presence/absence of millions/billions of particles occurring together can be obtained. Here, we describe a method, which not only intrinsically results in number concentrations, but is also sensitive enough to count individual nanoparticles, one by one. To make this possible, the sensitivity of the polymerase chain reaction (PCR) was combined with a binary (=0/1, yes/no) measurement arrangement, binomial statistics and DNA comprising monodisperse silica nanoparticles. With this method, individual tagged particles in the range of 60-250 nm could be detected and counted in drinking water in absolute number, utilizing a standard qPCR device within 1.5 h of measurement time. For comparison, the method was validated with single particle inductively coupled plasma mass spectrometry (sp-ICPMS).


Asunto(s)
ADN/análisis , Agua Potable/análisis , Nanopartículas/análisis , Reacción en Cadena de la Polimerasa/instrumentación , Dióxido de Silicio/análisis , Monitoreo del Ambiente/instrumentación , Diseño de Equipo , Nanopartículas/ultraestructura , Tamaño de la Partícula , Transición de Fase
13.
Angew Chem Int Ed Engl ; 54(8): 2552-5, 2015 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-25650567

RESUMEN

Information, such as text printed on paper or images projected onto microfilm, can survive for over 500 years. However, the storage of digital information for time frames exceeding 50 years is challenging. Here we show that digital information can be stored on DNA and recovered without errors for considerably longer time frames. To allow for the perfect recovery of the information, we encapsulate the DNA in an inorganic matrix, and employ error-correcting codes to correct storage-related errors. Specifically, we translated 83 kB of information to 4991 DNA segments, each 158 nucleotides long, which were encapsulated in silica. Accelerated aging experiments were performed to measure DNA decay kinetics, which show that data can be archived on DNA for millennia under a wide range of conditions. The original information could be recovered error free, even after treating the DNA in silica at 70 °C for one week. This is thermally equivalent to storing information on DNA in central Europe for 2000 years.


Asunto(s)
ADN/química , Almacenamiento y Recuperación de la Información/métodos , Dióxido de Silicio/química , Algoritmos , Almacenamiento y Recuperación de la Información/normas
14.
PLoS One ; 8(5): e64371, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23741321

RESUMEN

Nested canalizing Boolean functions (NCF) play an important role in biologically motivated regulatory networks and in signal processing, in particular describing stack filters. It has been conjectured that NCFs have a stabilizing effect on the network dynamics. It is well known that the average sensitivity plays a central role for the stability of (random) Boolean networks. Here we provide a tight upper bound on the average sensitivity of NCFs as a function of the number of relevant input variables. As conjectured in literature this bound is smaller than 4/3. This shows that a large number of functions appearing in biological networks belong to a class that has low average sensitivity, which is even close to a tight lower bound.


Asunto(s)
Análisis de Fourier , Redes Reguladoras de Genes , Modelos Estadísticos , Simulación por Computador
15.
EURASIP J Bioinform Syst Biol ; 2013(1): 6, 2013 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-23642003

RESUMEN

: Consider a large Boolean network with a feed forward structure. Given a probability distribution on the inputs, can one find, possibly small, collections of input nodes that determine the states of most other nodes in the network? To answer this question, a notion that quantifies the determinative power of an input over the states of the nodes in the network is needed. We argue that the mutual information (MI) between a given subset of the inputs X={X1,...,Xn} of some node i and its associated function fi(X) quantifies the determinative power of this set of inputs over node i. We compare the determinative power of a set of inputs to the sensitivity to perturbations to these inputs, and find that, maybe surprisingly, an input that has large sensitivity to perturbations does not necessarily have large determinative power. However, for unate functions, which play an important role in genetic regulatory networks, we find a direct relation between MI and sensitivity to perturbations. As an application of our results, we analyze the large-scale regulatory network of Escherichia coli. We identify the most determinative nodes and show that a small subset of those reduces the overall uncertainty of the network state significantly. Furthermore, the network is found to be tolerant to perturbations of its inputs.

16.
EURASIP J Bioinform Syst Biol ; 2011: 6, 2011 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-21989141

RESUMEN

Boolean models of regulatory networks are assumed to be tolerant to perturbations. That qualitatively implies that each function can only depend on a few nodes. Biologically motivated constraints further show that functions found in Boolean regulatory networks belong to certain classes of functions, for example, the unate functions. It turns out that these classes have specific properties in the Fourier domain. That motivates us to study the problem of detecting controlling nodes in classes of Boolean networks using spectral techniques. We consider networks with unbalanced functions and functions of an average sensitivity less than 23k, where k is the number of controlling variables for a function. Further, we consider the class of 1-low networks which include unate networks, linear threshold networks, and networks with nested canalyzing functions. We show that the application of spectral learning algorithms leads to both better time and sample complexity for the detection of controlling nodes compared with algorithms based on exhaustive search. For a particular algorithm, we state analytical upper bounds on the number of samples needed to find the controlling nodes of the Boolean functions. Further, improved algorithms for detecting controlling nodes in large-scale unate networks are given and numerically studied.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...