Cooperative sequence clustering and decoding for DNA storage system with fountain codes.

Jeong, Jaeho; Park, Seong-Joon; Kim, Jae-Won; No, Jong-Seon; Jeon, Ha Hyeon; Lee, Jeong Wook; No, Albert; Kim, Sunghwan; Park, Hosung

Jeong, Jaeho; Park, Seong-Joon; Kim, Jae-Won; No, Jong-Seon; Jeon, Ha Hyeon; Lee, Jeong Wook; No, Albert; Kim, Sunghwan; Park, Hosung.

Afiliação

Jeong J; Department of Electrical and Computer Engineering, Seoul National University, Institute of New Media and Communications (INMC), Seoul 08826, South Korea.
Park SJ; Department of Electrical and Computer Engineering, Seoul National University, Institute of New Media and Communications (INMC), Seoul 08826, South Korea.
Kim JW; Department of Electronic Engineering, Gyeongsang National University, Engineering Research Institute, Jinju 52828, South Korea.
No JS; Department of Electrical and Computer Engineering, Seoul National University, Institute of New Media and Communications (INMC), Seoul 08826, South Korea.
Jeon HH; Department of Chemical Engineering, POSTECH, Pohang 37673, South Korea.
Lee JW; Department of Chemical Engineering, POSTECH, Pohang 37673, South Korea.
No A; Department of Electronic and Electrical Engineering, Hongik University, Seoul 04066, South Korea.
Kim S; School of Electrical Engineering, University of Ulsan, Ulsan 44610, South Korea.
Park H; Department of Computer Engineering, Chonnam National University, Gwangju 61186, South Korea.

Bioinformatics ; 37(19): 3136-3143, 2021 Oct 11.

Article em En | MEDLINE | ID: mdl-33904574

ABSTRACT

ABSTRACT

MOTIVATION In DNA storage systems, there are tradeoffs between writing and reading costs. Increasing the code rate of error-correcting codes may save writing cost, but it will need more sequence reads for data retrieval. There is potentially a way to improve sequencing and decoding processes in such a way that the reading cost induced by this tradeoff is reduced without increasing the writing cost. In past researches, clustering, alignment and decoding processes were considered as separate stages but we believe that using the information from all these processes together may improve decoding performance. Actual experiments of DNA synthesis and sequencing should be performed because simulations cannot be relied on to cover all error possibilities in practical circumstances.

RESULTS:

For DNA storage systems using fountain code and Reed-Solomon (RS) code, we introduce several techniques to improve the decoding performance. We designed the decoding process focusing on the cooperation of key components Hamming-distance based clustering, discarding of abnormal sequence reads, RS error correction as well as detection and quality score-based ordering of sequences. We synthesized 513.6 KB data into DNA oligo pools and sequenced this data successfully with Illumina MiSeq instrument. Compared to Erlich's research, the proposed decoding method additionally incorporates sequence reads with minor errors which had been discarded before, and thus was able to make use of 10.6-11.9% more sequence reads from the same sequencing environment, this resulted in 6.5-8.9% reduction in the reading cost. Channel characteristics including sequence coverage and read-length distributions are provided as well. AVAILABILITY AND IMPLEMENTATION The raw data files and the source codes of our experiments are available at https//github.com/jhjeong0702/dna-storage.

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Coréia do Sul

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google