Deqformer: high-definition and scalable deep learning probe design method.

Cai, Yantong; Lv, Jia; Li, Rui; Huang, Xiaowen; Wang, Shi; Bao, Zhenmin; Zeng, Qifan

Cai, Yantong; Lv, Jia; Li, Rui; Huang, Xiaowen; Wang, Shi; Bao, Zhenmin; Zeng, Qifan.

Afiliación

Cai Y; MOE Key Laboratory of Marine Genetics and Breeding & Fang Zongxi Center for Marine Evo-Devo, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.
Lv J; MOE Key Laboratory of Marine Genetics and Breeding & Fang Zongxi Center for Marine Evo-Devo, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.
Li R; MOE Key Laboratory of Marine Genetics and Breeding & Fang Zongxi Center for Marine Evo-Devo, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.
Huang X; MOE Key Laboratory of Marine Genetics and Breeding & Fang Zongxi Center for Marine Evo-Devo, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.
Wang S; MOE Key Laboratory of Marine Genetics and Breeding & Fang Zongxi Center for Marine Evo-Devo, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.
Bao Z; Laboratory for Marine Biology and Biotechnology, Laoshan Laboratory, Qingdao 266237, China.
Zeng Q; Southern Marine Science and Engineer Guangdong Laboratory, Guangzhou, China.

Brief Bioinform ; 25(2)2024 Jan 22.

Article en En | MEDLINE | ID: mdl-38305453

ABSTRACT

ABSTRACT

Target enrichment sequencing techniques are gaining widespread use in the field of genomics, prized for their economic efficiency and swift processing times. However, their success depends on the performance of probes and the evenness of sequencing depth among each probe. To accurately predict probe coverage depth, a model called Deqformer is proposed in this study. Deqformer utilizes the oligonucleotides sequence of each probe, drawing inspiration from Watson-Crick base pairing and incorporating two BERT encoders to capture the underlying information from the forward and reverse probe strands, respectively. The encoded data are combined with a feed-forward network to make precise predictions of sequencing depth. The performance of Deqformer is evaluated on four different datasets SNP panel with 38 200 probes, lncRNA panel with 2000 probes, synthetic panel with 5899 probes and HD-Marker panel for Yesso scallop with 11 000 probes. The SNP and synthetic panels achieve impressive factor 3 of accuracy (F3acc) of 96.24% and 99.66% in 5-fold cross-validation. F3acc rates of over 87.33% and 72.56% are obtained when training on the SNP panel and evaluating performance on the lncRNA and HD-Marker datasets, respectively. Our analysis reveals that Deqformer effectively captures hybridization patterns, making it robust for accurate predictions in various scenarios. Deqformer leads to a novel perspective for probe design pipeline, aiming to enhance efficiency and effectiveness in probe design tasks.

Asunto(s)

Aprendizaje Profundo; ARN Largo no Codificante; Sondas de ADN/genética; Hibridación de Ácido Nucleico; Genómica

Palabras clave

DNA sequence; probe design; target enrichment genotyping; transformer model

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: ARN Largo no Codificante / Aprendizaje Profundo Tipo de estudio: Prognostic_studies Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: China

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google