Your browser doesn't support javascript.
loading
BERT-siRNA: siRNA target prediction based on BERT pre-trained interpretable model.
Xu, Jiayu; Xu, Nan; Xie, Weixin; Zhao, Chengkui; Yu, Lei; Feng, Weixing.
Affiliation
  • Xu J; Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China. Electronic address: xujiayu@hrbeu.edu.cn.
  • Xu N; Institute of Biomedical Engineering and Technology, Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, No, 3663 North Zhongshan Road, Shanghai 200065, China; Shanghai Unicar-Therapy Bio
  • Xie W; Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China. Electronic address: xieweixin@hrbeu.edu.cn.
  • Zhao C; Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; Shanghai Unicar-Therapy Bio-medicine Technology Co., Ltd, No 1525 Minqiang Road, Shanghai 201612, China. Electronic address: zhaochengkui@h
  • Yu L; Institute of Biomedical Engineering and Technology, Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, No, 3663 North Zhongshan Road, Shanghai 200065, China; Shanghai Unicar-Therapy Bio
  • Feng W; Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China. Electronic address: fengweixing@hrbeu.edu.cn.
Gene ; 910: 148330, 2024 Jun 05.
Article in En | MEDLINE | ID: mdl-38431236
ABSTRACT
Silencing mRNA through siRNA is vital for RNA interference (RNAi), necessitating accurate computational methods for siRNA selection. Current approaches, relying on machine learning, often face challenges with large data requirements and intricate data preprocessing, leading to reduced accuracy. To address this challenge, we propose a BERT model-based siRNA target gene knockdown efficiency prediction method called BERT-siRNA, which consists of a pre-trained DNA-BERT module and Multilayer Perceptron module. It applies the concept of transfer learning to avoid the limitation of a small sample size and the need for extensive preprocessing processes. By fine-tuning on various siRNA datasets after pretraining on extensive genomic data using DNA-BERT to enhance predictive capabilities. Our model clearly outperforms all existing siRNA prediction models through testing on the independent public siRNA dataset. Furthermore, the model's consistent predictions of high-efficiency siRNA knockdown for SARS-CoV-2, as well as its alignment with experimental results for PDCD1, CD38, and IL6, demonstrate the reliability and stability of the model. In addition, the attention scores for all 19-nt positions in the dataset indicate that the model's attention is predominantly focused on the 5' end of the siRNA. The step-by-step visualization of the hidden layer's classification progressively clarified and explained the effective feature extraction of the MLP layer. The explainability of model by analysis the attention scores and hidden layers is also our main purpose in this work, making it more explainable and reliable for biological researchers.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: DNA Language: En Journal: Gene Year: 2024 Type: Article

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: DNA Language: En Journal: Gene Year: 2024 Type: Article