RESUMO
BACKGROUND: Patients with advanced non-small cell lung cancer (NSCLC) are a heterogeneous population with short lifespan. We aimed to develop methods to better differentiate patients whose survival was >90 days. METHODS: We evaluated 83 characteristics of 106 treatment-naïve, stage IV NSCLC patients with Eastern Cooperative Oncology Group Performance Status (ECOG-PS) >1. Automated machine learning was used to select a model and optimize hyperparameters. 100-fold bootstrapping was performed for dimensionality reduction for a second ("lite") model. Performance was measured by C-statistic and accuracy metrics in an out-of-sample validation cohort. The "lite" model was validated on a second independent, prospective cohort (N = 42). Network analysis (NA) was performed to evaluate the differences in centrality and connectivity of features. RESULTS: The selected method was ExtraTrees Classifier, with C-statistic of 0.82 (p < 0.01) and accuracy of 0.81 (p = 0.01). The "lite" model had 16 variables and obtained C-statistic of 0.84 (p < 0.01) and accuracy of 0.75 (p = 0.039) in the first cohort, and C-statistic of 0.706 (p < 0.01) and accuracy of 0.714 (p < 0.01) in the second cohort. The networks of patients with lower survival were more interconnected. Features related to cachexia, inflammation, and quality of life had statistically different prestige scores in NA. CONCLUSIONS: Machine learning can assist in the prognostic evaluation of advanced NSCLC. The model generated with a reduced number of features showed high accessibility and reasonable metrics. Features related to quality of life, cachexia, and performance status had increased correlation and importance scores, suggesting that they play a role at later disease stages, in line with the biological rationale already described.