Your browser doesn't support javascript.
loading
LPI-EnEDT: an ensemble framework with extra tree and decision tree classifiers for imbalanced lncRNA-protein interaction data classification.
Peng, Lihong; Yuan, Ruya; Shen, Ling; Gao, Pengfei; Zhou, Liqian.
Affiliation
  • Peng L; School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.
  • Yuan R; College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.
  • Shen L; School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.
  • Gao P; School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.
  • Zhou L; College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.
BioData Min ; 14(1): 50, 2021 Dec 03.
Article in En | MEDLINE | ID: mdl-34861891
ABSTRACT

BACKGROUND:

Long noncoding RNAs (lncRNAs) have dense linkages with various biological processes. Identifying interacting lncRNA-protein pairs contributes to understand the functions and mechanisms of lncRNAs. Wet experiments are costly and time-consuming. Most computational methods failed to observe the imbalanced characterize of lncRNA-protein interaction (LPI) data. More importantly, they were measured based on a unique dataset, which produced the prediction bias.

RESULTS:

In this study, we develop an Ensemble framework (LPI-EnEDT) with Extra tree and Decision Tree classifiers to implement imbalanced LPI data classification. First, five LPI datasets are arranged. Second, lncRNAs and proteins are separately characterized based on Pyfeat and BioTriangle and concatenated as a vector to represent each lncRNA-protein pair. Finally, an ensemble framework with Extra tree and decision tree classifiers is developed to classify unlabeled lncRNA-protein pairs. The comparative experiments demonstrate that LPI-EnEDT outperforms four classical LPI prediction methods (LPI-BLS, LPI-CatBoost, LPI-SKF, and PLIPCOM) under cross validations on lncRNAs, proteins, and LPIs. The average AUC values on the five datasets are 0.8480, 0,7078, and 0.9066 under the three cross validations, respectively. The average AUPRs are 0.8175, 0.7265, and 0.8882, respectively. Case analyses suggest that there are underlying associations between HOTTIP and Q9Y6M1, NRON and Q15717.

CONCLUSIONS:

Fusing diverse biological features of lncRNAs and proteins and exploiting an ensemble learning model with Extra tree and decision tree classifiers, this work focus on imbalanced LPI data classification as well as interaction information inference for a new lncRNA (or protein).
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Health_economic_evaluation / Prognostic_studies Language: En Journal: BioData Min Year: 2021 Document type: Article Affiliation country: China

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Health_economic_evaluation / Prognostic_studies Language: En Journal: BioData Min Year: 2021 Document type: Article Affiliation country: China