Pesquisa | BVS Educação Profissional em Saúde

Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage.

Xie, Ranze; Zan, Xiangzhen; Chu, Ling; Su, Yanqing; Xu, Peng; Liu, Wenbin.

BMC Bioinformatics ; 24(1): 111, 2023 Mar 23.

Artigo em Inglês | MEDLINE | ID: mdl-36959531

RESUMO

Synchronization (insertions-deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved subsequences. In this paper, we conduct a comprehensive simulation study on the error correction capability of a typical MSA algorithm, MAFFT. Our results reveal that its capability exhibits a phase transition when there are around 20% errors. Below this critical value, increasing sequencing depth can eventually allow it to approach complete recovery. Otherwise, its performance plateaus at some poor levels. Given a reasonable sequencing depth (≤ 70), MSA could achieve complete recovery in the low error regime, and effectively correct 90% of the errors in the medium error regime. In addition, MSA is robust to imperfect clustering. It could also be combined with other means such as ECC, repeated markers, or any other code constraints. Furthermore, by selecting an appropriate sequencing depth, this strategy could achieve an optimal trade-off between cost and reading speed. MSA could be a competitive alternative for future DNA storage.

Assuntos

Algoritmos , DNA , Alinhamento de Sequência , DNA/genética , Simulação por Computador , Análise de Sequência de DNA

A Robust and Efficient DNA Storage Architecture Based on Modulation Encoding and Decoding.

Zan, Xiangzhen; Xie, Ranze; Yao, Xiangyu; Xu, Peng; Liu, Wenbin.

J Chem Inf Model ; 63(12): 3967-3976, 2023 06 26.

Artigo em Inglês | MEDLINE | ID: mdl-37289182

RESUMO

Synthetic DNA has been widely considered an attractive medium for digital data storage. However, the random insertion-deletion-substitution (IDS) errors in the sequenced reads still remain a critical challenge to reliable data recovery. Motivated by the modulation technique in the communication field, we propose a new DNA storage architecture to solve this problem. The main idea is that all binary data are modulated into DNA sequences with the same AT/GC patterns, which facilitate the detection of indels in noisy reads. The modulation signal could not only satisfy the encoding constraints but also serve as prior information to detect the potential positions of errors. Experiments on simulation and real data sets demonstrate that modulation encoding provides a simple way to comply with biological constraints for sequence encoding (i.e., balanced GC content and avoiding homopolymers). Furthermore, modulation decoding is highly efficient and extremely robust, which can correct up to â¼40% of errors. In addition, it is robust to imperfect clustering reconstruction, which is very common in practice. Although our method has a relatively low logical density of 1.0 bits/nt, its high robustness may provide a wide space for developing low-cost synthetic technologies. We believe this new architecture may boost the early coming of large-scale DNA storage applications in the future.

Assuntos

DNA , Armazenamento e Recuperação da Informação , Análise de Sequência de DNA/métodos , DNA/genética , Simulação por Computador

A Novel Image Encryption Scheme for DNA Storage Systems Based on DNA Hybridization and Gene Mutation.

Yao, Xiangyu; Xie, Ranze; Zan, Xiangzhen; Su, Yanqing; Xu, Peng; Liu, Wenbin.

Interdiscip Sci ; 15(3): 419-432, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37016040

RESUMO

With the rapid development of DNA (deoxyribonucleic acid) storage technologies, storing digital images in DNA is feasible. Meanwhile, the information security in DNA storage system is still a problem to solve. Therefore, in this paper, we propose a DNA storage-oriented image encryption algorithm utilizing the information processing mechanisms in molecule biology. The basic idea is to perform pixel replacement by gene hybridization, and implement dual diffusion by pixel diffusion and gene mutation. The ciphertext DNA image can be synthesized and stored in DNA storage system after encryption. Experimental results demonstrate it can resist common attacks, and shows a strong robustness against sequence loss and base substitution errors in the DNA storage channel. A DNA storage-oriented image encryption algorithm based on gene hybridization and gene mutation, First, we scramble rows and columns of the plaintext image by dynamic Josephus traversing. Second, we replace the pixels by gene hybridization. Finally, we diffuse the image matrix in binary domain and encode pixels into 8-base strands which are later further diffused by gene mutation. The ciphertext image can be synthesized according to the mutant gene codes and stored in any DNA storage system.

Assuntos

Algoritmos , Segurança Computacional , Mutação/genética , Difusão , DNA/genética

An image cryptography method by highly error-prone DNA storage channel.

Zan, Xiangzhen; Chu, Ling; Xie, Ranze; Su, Yanqing; Yao, Xiangyu; Xu, Peng; Liu, Wenbin.

Front Bioeng Biotechnol ; 11: 1173763, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37152655

RESUMO

Introduction: Rapid development in synthetic technologies has boosted DNA as a potential medium for large-scale data storage. Meanwhile, how to implement data security in the DNA storage system is still an unsolved problem. Methods: In this article, we propose an image encryption method based on the modulation-based storage architecture. The key idea is to take advantage of the unpredictable modulation signals to encrypt images in highly error-prone DNA storage channels. Results and Discussion: Numerical results have demonstrated that our image encryption method is feasible and effective with excellent security against various attacks (statistical, differential, noise, and data loss). When compared with other methods such as the hybridization reactions of DNA molecules, the proposed method is more reliable and feasible for large-scale applications.

Limit and screen sequences with high degree of secondary structures in DNA storage by deep learning method.

Lin, Wanmin; Chu, Ling; Su, Yanqing; Xie, Ranze; Yao, Xiangyu; Zan, Xiangzhen; Xu, Peng; Liu, Wenbin.

Comput Biol Med ; 166: 107548, 2023 Oct 02.

Artigo em Inglês | MEDLINE | ID: mdl-37801922

RESUMO

BACKGROUND: In single-stranded DNAs/RNAs, secondary structures are very common especially in long sequences. It has been recognized that the high degree of secondary structures in DNA sequences could interfere with the correct writing and reading of information in DNA storage. However, how to circumvent its side-effect is seldom studied. METHOD: As the degree of secondary structures of DNA sequences is closely related to the magnitude of the free energy released in the complicated folding process, we first investigate the free-energy distribution at different encoding lengths based on randomly generated DNA sequences. Then, we construct a bidirectional long short-term (BiLSTM)-attention deep learning model to predict the free energy of sequences. RESULTS: Our simulation results indicate that the free energy of DNA sequences at a specific length follows a right skewed distribution and the mean increases as the length increases. Given a tolerable free energy threshold of 20 kcal/mol, we could control the ratio of serious secondary structures in the encoding sequences to within 1% of the significant level through selecting a feasible encoding length of 100 nt. Compared with traditional deep learning models, the proposed model could achieve a better prediction performance both in the mean relative error (MRE) and the coefficient of determination (R2). It achieved MRE = 0.109 and R2 = 0.918 respectively in the simulation experiment. The combination of the BiLSTM and attention module can handle the long-term dependencies and capture the feature of base pairing. Further, the prediction has a linear time complexity which is suitable for detecting sequences with severe secondary structures in future large-scale applications. Finally, 70 of 94 predicted free energy can be screened out on a real dataset. It demonstrates that the proposed model could screen out some highly suspicious sequences which are prone to produce more errors and low sequencing copies.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA