Advancing NSCLC pathological subtype prediction with interpretable machine learning: a comprehensive radiomics-based approach.

Kuang, Bingling; Zhang, Jingxuan; Zhang, Mingqi; Xia, Haoming; Qiang, Guangliang; Zhang, Jiangyu

Kuang, Bingling; Zhang, Jingxuan; Zhang, Mingqi; Xia, Haoming; Qiang, Guangliang; Zhang, Jiangyu.

Afiliación

Kuang B; Department of Pathology, Affiliated Cancer Hospital and Institution of Guangzhou Medical University, Guangzhou, China.
Zhang J; Nanshan College, Guangzhou Medical University, Guangzhou, Guangdong, China.
Zhang M; Nanshan College, Guangzhou Medical University, Guangzhou, Guangdong, China.
Xia H; The Second Clinical School of Guangzhou Medical University, Guangzhou Medical University, Guangzhou, Guangdong, China.
Qiang G; School of Clinical Medicine, Tsinghua University, Beijing, China.
Zhang J; Department of Thoracic Surgery, Peking University Third Hospital, Beijing, China.

Front Med (Lausanne) ; 11: 1413990, 2024.

Article en En | MEDLINE | ID: mdl-38841579

ABSTRACT

ABSTRACT

Objective:

This research aims to develop and assess the performance of interpretable machine learning models for diagnosing three histological subtypes of non-small cell lung cancer (NSCLC) utilizing CT imaging data.

Methods:

A retrospective cohort of 317 patients diagnosed with NSCLC was included in the study. These individuals were randomly segregated into two groups a training set comprising 222 patients and a validation set with 95 patients, adhering to a 73 ratio. A comprehensive extraction yielded 1,834 radiomic features. For feature selection, statistical methodologies such as the Mann-Whitney U test, Spearman's rank correlation, and one-way logistic regression were employed. To address data imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was utilized. The study designed three distinct models to predict adenocarcinoma (ADC), squamous cell carcinoma (SCC), and large cell carcinoma (LCC). Six different classifiers, namely Logistic Regression, Support Vector Machine, Decision Tree, Random Forest, eXtreme Gradient Boosting (XGB), and LightGBM, were deployed for model training. Model performance was gauged through accuracy metrics and the area under the receiver operating characteristic (ROC) curves (AUC). To interpret the diagnostic process, the Shapley Additive Explanations (SHAP) approach was applied.

Results:

For the ADC, SCC, and LCC groups, 9, 12, and 8 key radiomic features were selected, respectively. In terms of model performance, the XGB model demonstrated superior performance in predicting SCC and LCC, with AUC values of 0.789 and 0.848, respectively. For ADC prediction, the Random Forest model excelled, showcasing an AUC of 0.748.

Conclusion:

The constructed machine learning models, leveraging CT imaging, exhibited robust predictive capabilities for SCC, LCC, and ADC subtypes of NSCLC. These interpretable models serve as substantial support for clinical decision-making processes.

Palabras clave

CT; histological subtype; interpretable machine learning; non-small cell lung cancer; radiomics

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: Front Med (Lausanne) Año: 2024 Tipo del documento: Article País de afiliación: China

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google