Your browser doesn't support javascript.
loading
A Contrastive Learning Pre-Training Method for Motif Occupancy Identification.
Lin, Ken; Quan, Xiongwen; Yin, Wenya; Zhang, Han.
Afiliação
  • Lin K; College of Artificial Intelligence, Nankai University, Tianjin 300350, China.
  • Quan X; College of Artificial Intelligence, Nankai University, Tianjin 300350, China.
  • Yin W; College of Artificial Intelligence, Nankai University, Tianjin 300350, China.
  • Zhang H; College of Artificial Intelligence, Nankai University, Tianjin 300350, China.
Int J Mol Sci ; 23(9)2022 Apr 24.
Article em En | MEDLINE | ID: mdl-35563090
ABSTRACT
Motif occupancy identification is a binary classification task predicting the binding of DNA motif instances to transcription factors, for which several sequence-based methods have been proposed. However, through direct training, these end-to-end methods are lack of biological interpretability within their sequence representations. In this work, we propose a contrastive learning method to pre-train interpretable and robust DNA encoding for motif occupancy identification. We construct two alternative models to pre-train DNA sequential encoder, respectively a self-supervised model and a supervised model. We augment the original sequences for contrastive learning with edit operations defined in edit distance. Specifically, we propose a sequence similarity criterion based on the Needleman-Wunsch algorithm to discriminate positive and negative sample pairs in self-supervised learning. Finally, a DNN classifier is fine-tuned along with the pre-trained encoder to predict the results of motif occupancy identification. Both proposed contrastive learning models outperform the baseline end-to-end CNN model and SimCLR method, reaching AUC of 0.811 and 0.823, respectively. Compared with the baseline method, our models show better robustness for small samples. Specifically, the self-supervised model is proved to be practicable in transfer learning.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Revista: Int J Mol Sci Ano de publicação: 2022 Tipo de documento: Article País de afiliação: China

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Revista: Int J Mol Sci Ano de publicação: 2022 Tipo de documento: Article País de afiliação: China