RESUMEN
BACKGROUND: A preoperative estimation of survival is critical for deciding on the operative management of metastatic bone disease of the extremities. Several tools have been developed for this purpose, but there is room for improvement. Machine learning is an increasingly popular and flexible method of prediction model building based on a data set. It raises some skepticism, however, because of the complex structure of these models. QUESTIONS/PURPOSES: The purposes of this study were (1) to develop machine learning algorithms for 90-day and 1-year survival in patients who received surgical treatment for a bone metastasis of the extremity, and (2) to use these algorithms to identify those clinical factors (demographic, treatment related, or surgical) that are most closely associated with survival after surgery in these patients. METHODS: All 1090 patients who underwent surgical treatment for a long-bone metastasis at two institutions between 1999 and 2017 were included in this retrospective study. The median age of the patients in the cohort was 63 years (interquartile range [IQR] 54 to 72 years), 56% of patients (610 of 1090) were female, and the median BMI was 27 kg/m (IQR 23 to 30 kg/m). The most affected location was the femur (70%), followed by the humerus (22%). The most common primary tumors were breast (24%) and lung (23%). Intramedullary nailing was the most commonly performed type of surgery (58%), followed by endoprosthetic reconstruction (22%), and plate screw fixation (14%). Missing data were imputed using the missForest methods. Features were selected by random forest algorithms, and five different models were developed on the training set (80% of the data): stochastic gradient boosting, random forest, support vector machine, neural network, and penalized logistic regression. These models were chosen as a result of their classification capability in binary datasets. Model performance was assessed on both the training set and the validation set (20% of the data) by discrimination, calibration, and overall performance. RESULTS: We found no differences among the five models for discrimination, with an area under the curve ranging from 0.86 to 0.87. All models were well calibrated, with intercepts ranging from -0.03 to 0.08 and slopes ranging from 1.03 to 1.12. Brier scores ranged from 0.13 to 0.14. The stochastic gradient boosting model was chosen to be deployed as freely available web-based application and explanations on both a global and an individual level were provided. For 90-day survival, the three most important factors associated with poorer survivorship were lower albumin level, higher neutrophil-to-lymphocyte ratio, and rapid growth primary tumor. For 1-year survival, the three most important factors associated with poorer survivorship were lower albumin level, rapid growth primary tumor, and lower hemoglobin level. CONCLUSIONS: Although the final models must be externally validated, the algorithms showed good performance on internal validation. The final models have been incorporated into a freely accessible web application that can be found at https://sorg-apps.shinyapps.io/extremitymetssurvival/. Pending external validation, clinicians may use this tool to predict survival for their individual patients to help in shared treatment decision making. LEVEL OF EVIDENCE: Level III, therapeutic study.
Asunto(s)
Neoplasias Óseas/cirugía , Técnicas de Apoyo para la Decisión , Aprendizaje Automático , Procedimientos Ortopédicos , Anciano , Neoplasias Óseas/diagnóstico por imagen , Neoplasias Óseas/mortalidad , Neoplasias Óseas/secundario , Boston , Toma de Decisiones Clínicas , Femenino , Humanos , Masculino , Persona de Mediana Edad , Procedimientos Ortopédicos/efectos adversos , Procedimientos Ortopédicos/mortalidad , Selección de Paciente , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Estudios Retrospectivos , Medición de Riesgo , Factores de Riesgo , Factores de Tiempo , Resultado del TratamientoRESUMEN
STUDY DESIGN: A systemic review and a meta-analysis. We also provided a retrospective cohort for validation in this study. OBJECTIVE: (1) Using a meta-analysis to determine the pooled discriminatory ability of The Skeletal Oncology Research Group (SORG) classical algorithm (CA) and machine learning algorithms (MLA); and (2) test the hypothesis that SORG-CA has less variability in performance than SORG-MLA in non-American validation cohorts as SORG-CA does not incorporates regional-specific variables such as body mass index as input. METHODS: After data extraction from the included studies, logit-transformation was applied for extracted AUCs for further analysis. The discriminatory abilities of both algorithms were directly compared by their logit (AUC)s. Further subgroup analysis by region (America vs non-America) was also conducted by comparing the corresponding logit (AUC). RESULTS: The pooled logit (AUC)s of 90-day SORG-CA was .82 (95% confidence interval [CI], .53-.11), 1-year SORG-CA was 1.11 (95% CI, .74-1.48), 90-day SORG-MLA was 1.36 (95% CI, 1.09-1.63), and 1-year SORG-MLA was 1.57 (95% CI, 1.17-1.98). All the algorithms performed better in United States than in Taiwan (P < .001). The performance of SORG-CA was less influenced by a non-American cohort than SORG-MLA. CONCLUSION: These observations might highlight the importance of incorporating region-specific variables into existing models to make them generalizable to racially or geographically distinct regions.