Your browser doesn't support javascript.
loading
Union With Recursive Feature Elimination: A Feature Selection Framework to Improve the Classification Performance of Multicategory Causes of Death in Colorectal Cancer.
Deng, Fei; Zhao, Lin; Yu, Ning; Lin, Yuxiang; Zhang, Lanjing.
Afiliação
  • Deng F; School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China. Electronic address: 2606897447@qq.com.
  • Zhao L; School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China.
  • Yu N; School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China.
  • Lin Y; School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China.
  • Zhang L; Department of Biological Sciences, Rutgers University, Newark, New Jersey; Department of Pathology, Princeton Medical Center, Plainsboro, New Jersey; Rutgers Cancer Institute of New Jersey, New Brunswick, New Jersey; Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University
Lab Invest ; 104(3): 100320, 2024 03.
Article em En | MEDLINE | ID: mdl-38158124
ABSTRACT
Despite the use of machine learning tools, it is challenging to properly model cause-specific deaths in colorectal cancer (CRC) patients and choose appropriate treatments. Here, we propose an interesting feature selection framework, namely union with recursive feature elimination (U-RFE), to select the union feature sets that are crucial in CRC progression-specific mortality using The Cancer Genome Atlas (TCGA) dataset. Based on the union feature sets, we compared the performance of 5 classification algorithms, including logistic regression (LR), support vector machines (SVM), random forest (RF), eXtreme gradient boosting (XGBoost), and Stacking, to identify the best model for classifying 4-category deaths. In the first stage of U-RFE, LR, SVM, and RF were used as base estimators to obtain subsets containing the same number of features but not exactly the same specific features. Union analysis of the subsets was then performed to determine the final union feature set, effectively combining the advantages of different algorithms. We found that the U-RFE framework could improve various models' performance. Stacking outperformed LR, SVM, RF, and XGBoost in most scenarios. When the target feature number of the RFE was set to 50 and the union feature set contained 298 deterministic features, the Stacking model achieved F1_weighted, Recall_weighted, Precision_weighted, Accuracy, and Matthews correlation coefficient of 0.851, 0.864, 0.854, 0.864, and 0.717, respectively. The performance of the minority categories was also significantly improved. Therefore, this recursive feature elimination-based approach of feature selection improves performances of classifying CRC deaths using clinical and omics data or those using other data with high feature redundancy and imbalance.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Neoplasias Colorretais Limite: Humans Idioma: En Revista: Lab Invest Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Neoplasias Colorretais Limite: Humans Idioma: En Revista: Lab Invest Ano de publicação: 2024 Tipo de documento: Article