DeeReCT-PolyA: a robust and generic deep learning method for PAS identification.

Xia, Zhihao; Li, Yu; Zhang, Bin; Li, Zhongxiao; Hu, Yuhui; Chen, Wei; Gao, Xin

Xia, Zhihao; Li, Yu; Zhang, Bin; Li, Zhongxiao; Hu, Yuhui; Chen, Wei; Gao, Xin.

Afiliación

Xia Z; Department of Computer Science and Engineering (CSE), Washington University in St Louis, St Louis, MO, USA.
Li Y; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia.
Zhang B; Department of Biology, Southern University of Science and Technology (SUSTC), Shenzhen, China.
Li Z; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia.
Hu Y; Department of Biology, Southern University of Science and Technology (SUSTC), Shenzhen, China.
Chen W; Department of Biology, Southern University of Science and Technology (SUSTC), Shenzhen, China.
Gao X; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia.

Bioinformatics ; 35(14): 2371-2379, 2019 07 15.

Article en En | MEDLINE | ID: mdl-30500881

RESUMEN

MOTIVATION: Polyadenylation is a critical step for gene expression regulation during the maturation of mRNA. An accurate and robust method for poly(A) signals (PASs) identification is not only desired for the purpose of better transcripts' end annotation, but can also help us gain a deeper insight of the underlying regulatory mechanism. Although many methods have been proposed for PAS recognition, most of them are PAS motif- and human-specific, which leads to high risks of overfitting, low generalization power, and inability to reveal the connections between the underlying mechanisms of different mammals. RESULTS: In this work, we propose a robust, PAS motif agnostic, and highly interpretable and transferrable deep learning model for accurate PAS recognition, which requires no prior knowledge or human-designed features. We show that our single model trained over all human PAS motifs not only outperforms the state-of-the-art methods trained on specific motifs, but can also be generalized well to two mouse datasets. Moreover, we further increase the prediction accuracy by transferring the deep learning model trained on the data of one species to the data of a different species. Several novel underlying poly(A) patterns are revealed through the visualization of important oligomers and positions in our trained models. Finally, we interpret the deep learning models by converting the convolutional filters into sequence logos and quantitatively compare the sequence logos between human and mouse datasets. AVAILABILITY AND IMPLEMENTATION: https://github.com/likesum/DeeReCT-PolyA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Aprendizaje Profundo; Animales; Humanos; Ratones; Poli A; Poliadenilación; Posición Específica de Matrices de Puntuación

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Aprendizaje Profundo Tipo de estudio: Diagnostic_studies / Prognostic_studies Límite: Animals / Humans Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2019 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google