Your browser doesn't support javascript.
loading
Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species.
Zheng, Yumin; Wang, Haohan; Zhang, Yang; Gao, Xin; Xing, Eric P; Xu, Min.
Affiliation
  • Zheng Y; School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom.
  • Wang H; Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
  • Zhang Y; Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
  • Gao X; Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.
  • Xing EP; Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
  • Xu M; Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
PLoS Comput Biol ; 16(11): e1008297, 2020 11.
Article in En | MEDLINE | ID: mdl-33151940
ABSTRACT
In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely used in computationally identifying PAS, the need for tremendous amounts of annotation data hinder applications of existing methods in species without experimental data on PAS. Therefore, cross-species PAS identification, which enables the possibility to predict PAS from untrained species, naturally becomes a promising direction. In our works, we propose a novel deep learning method named Poly(A)-DG for cross-species PAS identification. Poly(A)-DG consists of a Convolution Neural Network-Multilayer Perceptron (CNN-MLP) network and a domain generalization technique. It learns PAS patterns from the training species and identifies PAS in target species without re-training. To test our method, we use four species and build cross-species training sets with two of them and evaluate the performance of the remaining ones. Moreover, we test our method against insufficient data and imbalanced data issues and demonstrate that Poly(A)-DG not only outperforms state-of-the-art methods but also maintains relatively high accuracy when it comes to a smaller or imbalanced training set.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Poly A / Signal Transduction / Deoxyguanosine / Deep Learning Type of study: Prognostic_studies Limits: Animals / Humans Language: En Journal: PLoS Comput Biol Journal subject: BIOLOGIA / INFORMATICA MEDICA Year: 2020 Document type: Article Affiliation country: United kingdom

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Poly A / Signal Transduction / Deoxyguanosine / Deep Learning Type of study: Prognostic_studies Limits: Animals / Humans Language: En Journal: PLoS Comput Biol Journal subject: BIOLOGIA / INFORMATICA MEDICA Year: 2020 Document type: Article Affiliation country: United kingdom