Your browser doesn't support javascript.
loading
Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier.
Chen, Cheng; Zhang, Qingmei; Yu, Bin; Yu, Zhaomin; Lawrence, Patrick J; Ma, Qin; Zhang, Yan.
Afiliación
  • Chen C; College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China.
  • Zhang Q; College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China.
  • Yu B; College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China; School of Life Sciences, University of Science and Technolo
  • Yu Z; College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China.
  • Lawrence PJ; Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
  • Ma Q; Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
  • Zhang Y; College of Electromechanical Engineering, Qingdao University of Science and Technology, Qingdao, 266061, China.
Comput Biol Med ; 123: 103899, 2020 08.
Article en En | MEDLINE | ID: mdl-32768046
Protein-protein interactions (PPIs) are involved with most cellular activities at the proteomic level, making the study of PPIs necessary to comprehending any biological process. Machine learning approaches have been explored, leading to more accurate and generalized PPIs predictions. In this paper, we propose a predictive framework called StackPPI. First, we use pseudo amino acid composition, Moreau-Broto, Moran and Geary autocorrelation descriptor, amino acid composition position-specific scoring matrix, Bi-gram position-specific scoring matrix and composition, transition and distribution to encode biologically relevant features. Secondly, we employ XGBoost to reduce feature noise and perform dimensionality reduction through gradient boosting and average gain. Finally, the optimized features that result are analyzed by StackPPI, a PPIs predictor we have developed from a stacked ensemble classifier consisting of random forest, extremely randomized trees and logistic regression algorithms. Five-fold cross-validation shows StackPPI can successfully predict PPIs with an ACC of 89.27%, MCC of 0.7859, AUC of 0.9561 on Helicobacter pylori, and with an ACC of 94.64%, MCC of 0.8934, AUC of 0.9810 on Saccharomyces cerevisiae. We find StackPPI improves protein interaction prediction accuracy on independent test sets compared to the state-of-the-art models. Finally, we highlight StackPPI's ability to infer biologically significant PPI networks. StackPPI's accurate prediction of functional pathways make it the logical choice for studying the underlying mechanism of PPIs, especially as it applies to drug design. The datasets and source code used to create StackPPI are available here: https://github.com/QUST-AIBBDRC/StackPPI/.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Saccharomyces cerevisiae / Proteómica Tipo de estudio: Clinical_trials / Prognostic_studies / Risk_factors_studies Idioma: En Revista: Comput Biol Med Año: 2020 Tipo del documento: Article País de afiliación: China

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Saccharomyces cerevisiae / Proteómica Tipo de estudio: Clinical_trials / Prognostic_studies / Risk_factors_studies Idioma: En Revista: Comput Biol Med Año: 2020 Tipo del documento: Article País de afiliación: China