Your browser doesn't support javascript.
loading
4 mC site recognition algorithm based on pruned pre-trained DNABert-Pruning model and fused artificial feature encoding.
Xie, Guo-Bo; Yu, Yi; Lin, Zhi-Yi; Chen, Rui-Bin; Xie, Jian-Hui; Liu, Zhen-Guo.
Afiliación
  • Xie GB; Guangdong University of Technology, Guangzhou, 510000, China.
  • Yu Y; Guangdong University of Technology, Guangzhou, 510000, China.
  • Lin ZY; Guangdong University of Technology, Guangzhou, 510000, China. Electronic address: lzy291@gdut.edu.cn.
  • Chen RB; Guangdong University of Technology, Guangzhou, 510000, China.
  • Xie JH; Guangdong University of Technology, Guangzhou, 510000, China.
  • Liu ZG; Department of Thoracic Surgery, The First Affiliated Hospital of Sun Yat-sen University, 58 Zhongshan 2nd Road, Guangzhou, 510080, China. Electronic address: liuzhg2@mail.sysu.edu.cn.
Anal Biochem ; 689: 115492, 2024 Jun.
Article en En | MEDLINE | ID: mdl-38458307
ABSTRACT
DNA 4 mC plays a crucial role in the genetic expression process of organisms. However, existing deep learning algorithms have shortcomings in the ability to represent DNA sequence features. In this paper, we propose a 4 mC site identification algorithm, DNABert-4mC, based on a fusion of the pruned pre-training DNABert-Pruning model and artificial feature encoding to identify 4 mC sites. The algorithm prunes and compresses the DNABert model, resulting in the pruned pre-training model DNABert-Pruning. This model reduces the number of parameters and removes redundancy from output features, yielding more precise feature representations while upholding accuracy.Simultaneously, the algorithm constructs an artificial feature encoding module to assist the DNABert-Pruning model in feature representation, effectively supplementing the information that is missing from the pre-trained features. The algorithm also introduces the AFF-4mC fusion strategy, which combines artificial feature encoding with the DNABert-Pruning model, to improve the feature representation capability of DNA sequences in multi-semantic spaces and better extract 4 mC sites and the distribution of nucleotide importance within the sequence. In experiments on six independent test sets, the DNABert-4mC algorithm achieved an average AUC value of 93.81%, outperforming seven other advanced algorithms with improvements of 2.05%, 5.02%, 11.32%, 5.90%, 12.02%, 2.42% and 2.34%, respectively.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / ADN Idioma: En Revista: Anal Biochem Año: 2024 Tipo del documento: Article País de afiliación: China

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / ADN Idioma: En Revista: Anal Biochem Año: 2024 Tipo del documento: Article País de afiliación: China