Explainable machine learning model based on clinical factors for predicting the disappearance of indeterminate pulmonary nodules.

Wang, Jingxuan; Sourlos, Nikos; Heuvelmans, Marjolein; Prokop, Mathias; Vliegenthart, Rozemarijn; van Ooijen, Peter

Wang, Jingxuan; Sourlos, Nikos; Heuvelmans, Marjolein; Prokop, Mathias; Vliegenthart, Rozemarijn; van Ooijen, Peter.

Afiliación

Wang J; Department of Radiology, University of Groningen, University Medical Center of Groningen, Groningen, the Netherlands. Electronic address: j.wang02@umcg.nl.
Sourlos N; Department of Radiology, University of Groningen, University Medical Center of Groningen, Groningen, the Netherlands.
Heuvelmans M; Department of Epidemiology, University of Groningen, University Medical Center of Groningen, Groningen, the Netherlands.
Prokop M; Department of Radiology, University of Groningen, University Medical Center of Groningen, Groningen, the Netherlands.
Vliegenthart R; Department of Radiology, University of Groningen, University Medical Center of Groningen, Groningen, the Netherlands; Data Science in Health (DASH), University of Groningen, University Medical Center of Groningen, Groningen, the Netherlands.
van Ooijen P; Department of Radiation Oncology, University of Groningen, University Medical Center of Groningen, Groningen, the Netherlands; Data Science in Health (DASH), University of Groningen, University Medical Center of Groningen, Groningen, the Netherlands. Electronic address: p.m.a.van.ooijen@umcg.nl.

Comput Biol Med ; 169: 107871, 2024 Feb.

Article en En | MEDLINE | ID: mdl-38154157

ABSTRACT

ABSTRACT

BACKGROUND:

During lung cancer screening, indeterminate pulmonary nodules (IPNs) are a frequent finding. We aim to predict whether IPNs are resolving or non-resolving to reduce follow-up examinations, using machine learning (ML) models. We incorporated dedicated techniques to enhance prediction explainability.

METHODS:

In total, 724 IPNs (size 50-500 mm3, 575 participants) from the Dutch-Belgian Randomized Lung Cancer Screening Trial were used. We implemented six ML models and 14 factors to predict nodule disappearance. Random search was applied to determine the optimal hyperparameters on the training set (579 nodules). ML models were trained using 5-fold cross-validation and tested on the test set (145 nodules). Model predictions were evaluated by utilizing the recall, precision, F1 score, and the area under the receiver operating characteristic curve (AUC). The best-performing model was used for three feature importance techniques mean decrease in impurity (MDI), permutation feature importance (PFI), and SHAPley Additive exPlanations (SHAP).

RESULTS:

The random forest model outperformed the other ML models with an AUC of 0.865. This model achieved a recall of 0.646, a precision of 0.816, and an F1 score of 0.721. The evaluation of feature importance achieved consistent ranking across all three methods for the most crucial factors. The MDI, PFI, and SHAP methods highlighted volume, maximum diameter, and minimum diameter as the top three factors. However, the remaining factors revealed discrepant ranking across methods.

CONCLUSION:

ML models effectively predict IPN disappearance using participant demographics and nodule characteristics. Explainable techniques can assist clinicians in developing understandable preliminary assessments.

Asunto(s)

Neoplasias Pulmonares; Humanos; Detección Precoz del Cáncer; Aprendizaje Automático; Curva ROC; Ensayos Clínicos Controlados Aleatorios como Asunto

Palabras clave

Clinical factor; Explainable machine learning; Feature importance; Indeterminate pulmonary nodule; Visualization

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Neoplasias Pulmonares Límite: Humans Idioma: En Revista: Comput Biol Med Año: 2024 Tipo del documento: Article

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google