RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins.

Peng, Xinxin; Wang, Xiaoyu; Guo, Yuming; Ge, Zongyuan; Li, Fuyi; Gao, Xin; Song, Jiangning

Peng, Xinxin; Wang, Xiaoyu; Guo, Yuming; Ge, Zongyuan; Li, Fuyi; Gao, Xin; Song, Jiangning.

Afiliação

Peng X; Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.
Wang X; Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia.
Guo Y; Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.
Ge Z; Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia.
Li F; Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria 3004, Australia.
Gao X; Monash e-Research Centre and Faculty of Engineering, Monash University, Melbourne, VIC 3800, Australia.
Song J; Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.

Brief Bioinform ; 23(4)2022 07 18.

Article em En | MEDLINE | ID: mdl-35649392

ABSTRACT

ABSTRACT

RNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have important implications for biomedical and biotechnological applications. Here, we propose a two-stage deep transfer learning-based framework, termed RBP-TSTL, for accurate prediction of RBPs. In the first stage, the knowledge from the self-supervised pre-trained model was extracted as feature embeddings and used to represent the protein sequences, while in the second stage, a customized deep learning model was initialized based on an annotated pre-training RBPs dataset before being fine-tuned on each corresponding target species dataset. This two-stage transfer learning framework can enable the RBP-TSTL model to be effectively trained to learn and improve the prediction performance. Extensive performance benchmarking of the RBP-TSTL models trained using the features generated by the self-supervised pre-trained model and other models trained using hand-crafting encoding features demonstrated the effectiveness of the proposed two-stage knowledge transfer strategy based on the self-supervised pre-trained models. Using the best-performing RBP-TSTL models, we further conducted genome-scale RBP predictions for Homo sapiens, Arabidopsis thaliana, Escherichia coli, and Salmonella and established a computational compendium containing all the predicted putative RBPs candidates. We anticipate that the proposed RBP-TSTL approach will be explored as a useful tool for the characterization of RNA-binding proteins and exploration of their sequence-structure-function relationships.

Assuntos

Proteínas de Ligação a RNA; RNA; Sítios de Ligação/genética; Genoma; Humanos; Aprendizado de Máquina; RNA/química; Proteínas de Ligação a RNA/metabolismo; Análise de Sequência de RNA/métodos

Palavras-chave

RNA binding proteins; deep learning; knowledge transfer learning; pre-trained language model; sequence analysis

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: RNA / Proteínas de Ligação a RNA Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google