RESUMEN
OBJECTIVE: To develop a prediction model for survival of patients with coronary artery disease (CAD) using health conditions beyond cardiovascular risk factors, including maximal exercise capacity, through the application of machine learning (ML) techniques. METHODS: Analysis of data from a retrospective cohort linking clinical, administrative, and vital status databases from 1995 to 2016 was performed. Inclusion criteria were age 18 years or older, diagnosis of CAD, referral to a cardiac rehabilitation program, and available baseline exercise test results. Primary outcome was death from any cause. Feature selection was performed using supervised and unsupervised ML techniques. The final prognostic model used the survival tree (ST) algorithm. RESULTS: From the cohort of 13,362 patients (60±11 years; 2400 [18%] women), 1577 died during a median follow-up of 8 years (interquartile range, 4 to 13 years), with an estimated survival of 67% up to 21 years. Feature selection revealed age and peak metabolic equivalents (METs) as the features with the greatest importance for mortality prediction. Using these 2 features, the ST generated a long-term prediction with a C-index of 0.729 by splitting patients in 8 clusters with different survival probabilities (P<.001). The ST root node was split by peak METs of 6.15 or less or more than 6.15, and each patient's subgroup was further split by age or other peak METs cut points. CONCLUSION: Applying ML techniques, age and maximal exercise capacity accurately predict mortality in patients with CAD and outperform variables commonly used for decision-making in clinical practice. A novel and simple prognostic model was established, and maximal exercise capacity was further suggested to be one of the most powerful predictors of mortality in CAD.