Your browser doesn't support javascript.
loading
Explorer: efficient DNA coding by De Bruijn graph toward arbitrary local and global biochemical constraints.
Dou, Chang; Yang, Yijie; Zhu, Fei; Li, BingZhi; Duan, Yuping.
Afiliación
  • Dou C; Center for Applied Mathematics, Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China.
  • Yang Y; Center for Applied Mathematics, Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China.
  • Zhu F; Center for Applied Mathematics, Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China.
  • Li B; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China.
  • Duan Y; School of Chemical Engineering and Technology, Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China.
Brief Bioinform ; 25(5)2024 Jul 25.
Article en En | MEDLINE | ID: mdl-39073829
ABSTRACT
With the exponential growth of digital data, there is a pressing need for innovative storage media and techniques. DNA molecules, due to their stability, storage capacity, and density, offer a promising solution for information storage. However, DNA storage also faces numerous challenges, such as complex biochemical constraints and encoding efficiency. This paper presents Explorer, a high-efficiency DNA coding algorithm based on the De Bruijn graph, which leverages its capability to characterize local sequences. Explorer enables coding under various biochemical constraints, such as homopolymers, GC content, and undesired motifs. This paper also introduces Codeformer, a fast decoding algorithm based on the transformer architecture, to further enhance decoding efficiency. Numerical experiments indicate that, compared with other advanced algorithms, Explorer not only achieves stable encoding and decoding under various biochemical constraints but also increases the encoding efficiency and bit rate by ¿10%. Additionally, Codeformer demonstrates the ability to efficiently decode large quantities of DNA sequences. Under different parameter settings, its decoding efficiency exceeds that of traditional algorithms by more than two-fold. When Codeformer is combined with Reed-Solomon code, its decoding accuracy exceeds 99%, making it a good choice for high-speed decoding applications. These advancements are expected to contribute to the development of DNA-based data storage systems and the broader exploration of DNA as a novel information storage medium.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / ADN Idioma: En Revista: Brief Bioinform / Brief. bioinform / Briefings in bioinformatics Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: China

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / ADN Idioma: En Revista: Brief Bioinform / Brief. bioinform / Briefings in bioinformatics Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: China