RESUMO
BACKGROUND AND OBJECTIVES: Body composition measurements using computed tomography (CT) may serve as imaging biomarkers of survival in patients with and without cancer. This study assesses whether body composition measurements obtained on abdominal CTs are independently associated with 90-day and 1-year mortality in patients with long-bone metastases undergoing surgery. METHODS: This single institutional retrospective study included 212 patients who had undergone surgery for long-bone metastases and had a CT of the abdomen within 90 days before surgery. Quantification of cross-sectional areas (CSA) and CT attenuation of abdominal subcutaneous adipose tissue, visceral adipose tissue, and paraspinous and abdominal muscles were performed at L4. Multivariate Cox proportional-hazards analyses were performed. RESULTS: Sarcopenia was independently associated with 90-day mortality (hazard ratio [HR] = 1.87; 95% confidence interval [CI] = 1.11-3.16; p = 0.019) and 1-year mortality (HR = 1.50; 95% CI = 1.02-2.19; p = 0.038) in multivariate analysis while controlling for clinical variables such as primary tumors, comorbidities, and chemotherapy. Abdominal fat CSAs and muscle attenuation were not associated with mortality. CONCLUSIONS: The presence of sarcopenia assessed by CT is predictive of 90-day and 1-year mortality in patients undergoing surgery for long-bone metastases. This body composition measurement can be used as novel imaging biomarker supplementing existing prognostic tools to optimize patient selection for surgery and improve shared decision making.
Assuntos
Neoplasias Ósseas , Sarcopenia , Composição Corporal , Neoplasias Ósseas/complicações , Neoplasias Ósseas/cirurgia , Humanos , Músculo Esquelético , Prognóstico , Modelos de Riscos Proporcionais , Estudos Retrospectivos , Sarcopenia/complicaçõesRESUMO
BACKGROUND: Incidental durotomy is an intraoperative complication in spine surgery that can lead to postoperative complications, increased length of stay, and higher healthcare costs. Natural language processing (NLP) is an artificial intelligence method that assists in understanding free-text notes that may be useful in the automated surveillance of adverse events in orthopaedic surgery. A previously developed NLP algorithm is highly accurate in the detection of incidental durotomy on internal validation and external validation in an independent cohort from the same country. External validation in a cohort with linguistic differences is required to assess the transportability of the developed algorithm, referred to geographical validation. Ideally, the performance of a prediction model, the NLP algorithm, is constant across geographic regions to ensure reproducibility and model validity. QUESTION/PURPOSE: Can we geographically validate an NLP algorithm for the automated detection of incidental durotomy across three independent cohorts from two continents? METHODS: Patients 18 years or older undergoing a primary procedure of (thoraco)lumbar spine surgery were included. In Massachusetts, between January 2000 and June 2018, 1000 patients were included from two academic and three community medical centers. In Maryland, between July 2016 and November 2018, 1279 patients were included from one academic center, and in Australia, between January 2010 and December 2019, 944 patients were included from one academic center. The authors retrospectively studied the free-text operative notes of included patients for the primary outcome that was defined as intraoperative durotomy. Incidental durotomy occurred in 9% (93 of 1000), 8% (108 of 1279), and 6% (58 of 944) of the patients, respectively, in the Massachusetts, Maryland, and Australia cohorts. No missing reports were observed. Three datasets (Massachusetts, Australian, and combined Massachusetts and Australian) were divided into training and holdout test sets in an 80:20 ratio. An extreme gradient boosting (an efficient and flexible tree-based algorithm) NLP algorithm was individually trained on each training set, and the performance of the three NLP algorithms (respectively American, Australian, and combined) was assessed by discrimination via area under the receiver operating characteristic curves (AUC-ROC; this measures the model's ability to distinguish patients who obtained the outcomes from those who did not), calibration metrics (which plot the predicted and the observed probabilities) and Brier score (a composite of discrimination and calibration). In addition, the sensitivity (true positives, recall), specificity (true negatives), positive predictive value (also known as precision), negative predictive value, F1-score (composite of precision and recall), positive likelihood ratio, and negative likelihood ratio were calculated. RESULTS: The combined NLP algorithm (the combined Massachusetts and Australian data) achieved excellent performance on independent testing data from Australia (AUC-ROC 0.97 [95% confidence interval 0.87 to 0.99]), Massachusetts (AUC-ROC 0.99 [95% CI 0.80 to 0.99]) and Maryland (AUC-ROC 0.95 [95% CI 0.93 to 0.97]). The NLP developed based on the Massachusetts cohort had excellent performance in the Maryland cohort (AUC-ROC 0.97 [95% CI 0.95 to 0.99]) but worse performance in the Australian cohort (AUC-ROC 0.74 [95% CI 0.70 to 0.77]). CONCLUSION: We demonstrated the clinical utility and reproducibility of an NLP algorithm with combined datasets retaining excellent performance in individual countries relative to algorithms developed in the same country alone for detection of incidental durotomy. Further multi-institutional, international collaborations can facilitate the creation of universal NLP algorithms that improve the quality and safety of orthopaedic surgery globally. The combined NLP algorithm has been incorporated into a freely accessible web application that can be found at https://sorg-apps.shinyapps.io/nlp_incidental_durotomy/ . Clinicians and researchers can use the tool to help incorporate the model in evaluating spine registries or quality and safety departments to automate detection of incidental durotomy and optimize prevention efforts. LEVEL OF EVIDENCE: Level III, diagnostic study.
Assuntos
Inteligência Artificial , Processamento de Linguagem Natural , Algoritmos , Austrália , Humanos , Reprodutibilidade dos Testes , Estudos RetrospectivosRESUMO
BACKGROUND: The Skeletal Oncology Research Group machine-learning algorithms (SORG-MLAs) estimate 90-day and 1-year survival in patients with long-bone metastases undergoing surgical treatment and have demonstrated good discriminatory ability on internal validation. However, the performance of a prediction model could potentially vary by race or region, and the SORG-MLA must be externally validated in an Asian cohort. Furthermore, the authors of the original developmental study did not consider the Eastern Cooperative Oncology Group (ECOG) performance status, a survival prognosticator repeatedly validated in other studies, in their algorithms because of missing data. QUESTIONS/PURPOSES: (1) Is the SORG-MLA generalizable to Taiwanese patients for predicting 90-day and 1-year mortality? (2) Is the ECOG score an independent factor associated with 90-day and 1-year mortality while controlling for SORG-MLA predictions? METHODS: All 356 patients who underwent surgery for long-bone metastases between 2014 and 2019 at one tertiary care center in Taiwan were included. Ninety-eight percent (349 of 356) of patients were of Han Chinese descent. The median (range) patient age was 61 years (25 to 95), 52% (184 of 356) were women, and the median BMI was 23 kg/m2 (13 to 39 kg/m2). The most common primary tumors were lung cancer (33% [116 of 356]) and breast cancer (16% [58 of 356]). Fifty-five percent (195 of 356) of patients presented with a complete pathologic fracture. Intramedullary nailing was the most commonly performed type of surgery (59% [210 of 356]), followed by plate screw fixation (23% [81 of 356]) and endoprosthetic reconstruction (18% [65 of 356]). Six patients were lost to follow-up within 90 days; 30 were lost to follow-up within 1 year. Eighty-five percent (301 of 356) of patients were followed until death or for at least 2 years. Survival was 82% (287 of 350) at 90 days and 49% (159 of 326) at 1 year. The model's performance metrics included discrimination (concordance index [c-index]), calibration (intercept and slope), and Brier score. In general, a c-index of 0.5 indicates random guess and a c-index of 0.8 denotes excellent discrimination. Calibration refers to the agreement between the predicted outcomes and the actual outcomes, with a perfect calibration having an intercept of 0 and a slope of 1. The Brier score of a prediction model must be compared with and ideally should be smaller than the score of the null model. A decision curve analysis was then performed for the 90-day and 1-year prediction models to evaluate their net benefit across a range of different threshold probabilities. A multivariate logistic regression analysis was used to evaluate whether the ECOG score was an independent prognosticator while controlling for the SORG-MLA's predictions. We did not perform retraining/recalibration because we were not trying to update the SORG-MLA algorithm in this study. RESULTS: The SORG-MLA had good discriminatory ability at both timepoints, with a c-index of 0.80 (95% confidence interval 0.74 to 0.86) for 90-day survival prediction and a c-index of 0.84 (95% CI 0.80 to 0.89) for 1-year survival prediction. However, the calibration analysis showed that the SORG-MLAs tended to underestimate Taiwanese patients' survival (90-day survival prediction: calibration intercept 0.78 [95% CI 0.46 to 1.10], calibration slope 0.74 [95% CI 0.53 to 0.96]; 1-year survival prediction: calibration intercept 0.75 [95% CI 0.49 to 1.00], calibration slope 1.22 [95% CI 0.95 to 1.49]). The Brier score of the 90-day and 1-year SORG-MLA prediction models was lower than their respective null model (0.12 versus 0.16 for 90-day prediction; 0.16 versus 0.25 for 1-year prediction), indicating good overall performance of SORG-MLAs at these two timepoints. Decision curve analysis showed SORG-MLAs provided net benefits when threshold probabilities ranged from 0.40 to 0.95 for 90-day survival prediction and from 0.15 to 1.0 for 1-year prediction. The ECOG score was an independent factor associated with 90-day mortality (odds ratio 1.94 [95% CI 1.01 to 3.73]) but not 1-year mortality (OR 1.07 [95% CI 0.53 to 2.17]) after controlling for SORG-MLA predictions for 90-day and 1-year survival, respectively. CONCLUSION: SORG-MLAs retained good discriminatory ability in Taiwanese patients with long-bone metastases, although their actual survival time was slightly underestimated. More international validation and incremental value studies that address factors such as the ECOG score are warranted to refine the algorithms, which can be freely accessed online at https://sorg-apps.shinyapps.io/extremitymetssurvival/. LEVEL OF EVIDENCE: Level III, therapeutic study.
Assuntos
Neoplasias Ósseas/mortalidade , Neoplasias Ósseas/secundário , Aprendizado de Máquina , Adulto , Idoso , Idoso de 80 Anos ou mais , Neoplasias Ósseas/cirurgia , Extremidades/patologia , Extremidades/cirurgia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Período Pós-Operatório , Valor Preditivo dos Testes , Prognóstico , TaiwanRESUMO
BACKGROUND: Patients with bone metastases often are unable to complete quality of life (QoL) questionnaires, and cohabitants (such as spouses, domestic partners, offspring older than 18 years, or other people who live with the patient) could be a reliable alternative. However, the extent of reliability in this complicated patient population remains undefined, and the influence of the cohabitant's condition on their assessment of the patient's QoL is unknown. QUESTIONS/PURPOSES: (1) Do QoL scores, measured by the 5-level EuroQol-5D (EQ-5D-5L) version and the Patient-reported Outcomes Measurement Information System (PROMIS) version 1.0 in three domains (anxiety, pain interference, and depression), reported by patients differ markedly from scores as assessed by their cohabitants? (2) Do cohabitants' PROMIS-Depression scores correlate with differences in measured QoL results? METHODS: This cross-sectional study included patients and cohabitants older than 18 years of age. Patients included those with presence of histologically confirmed bone metastases (including lymphoma and multiple myeloma), and cohabitants must have been present at the clinic visit. Patients were eligible for inclusion in the study regardless of comorbidities, prognosis, prior surgery, or current treatment. Between June 1, 2016 and March 1, 2017 and between October 1, 2017 and February 26, 2018, all 96 eligible patients were approached, of whom 49% (47) met the selection criteria and were willing to participate. The included 47 patient-cohabitant pairs independently completed the EQ-5D-5L and the eight-item PROMIS for three domains (anxiety, pain, and depression) with respect to the patients' symptoms. The cohabitants also completed the four-item PROMIS-Depression survey with respect to their own symptoms. RESULTS: There were no clinically important differences between the scores of patients and their cohabitants for all questionnaires, and the agreement between patient and cohabitant scores was moderate to strong (Spearman correlation coefficients ranging from 0.52 to 0.72 on the four questionnaires; all p values < 0.05). However, despite the good agreement in QoL scores, an increased cohabitant's depression score was correlated with an overestimation of the patient's symptom burden for the anxiety and depression domains (weak Spearman correlation coefficient of 0.33 [95% confidence interval 0.08 to 0.58]; p = 0.01 and moderate Spearman correlation coefficient of 0.52 [95% CI 0.29 to 0.74]; p < 0.01, respectively). CONCLUSION: The present findings support that cohabitants might be reliable raters of the QoL of patients with bone metastases. However, if a patient's cohabitant has depression, the cohabitant may overestimate a patient's symptoms in emotional domains such as anxiety and depression, warranting further research that includes cohabitants with and without depression to elucidate the effect of depression on the level of agreement. For now, clinicians may want to reconsider using the cohabitant's judgement if depression is suspected. CLINICAL RELEVANCE: These findings suggest that a cohabitant's impressions of a patient's quality of life are, in most instances, accurate; this is potentially helpful in situations where the patient cannot weigh in. Future studies should employ longitudinal designs to see how or whether our findings change over time and with disease progression, and how specific interventions-like different chemotherapeutic regimens or surgery-may factor in.
Assuntos
Filhos Adultos/psicologia , Ansiedade/diagnóstico , Neoplasias Ósseas/diagnóstico , Dor do Câncer/diagnóstico , Depressão/diagnóstico , Saúde Mental , Qualidade de Vida , Cônjuges/psicologia , Inquéritos e Questionários , Idoso , Ansiedade/fisiopatologia , Ansiedade/psicologia , Neoplasias Ósseas/fisiopatologia , Neoplasias Ósseas/psicologia , Neoplasias Ósseas/secundário , Dor do Câncer/fisiopatologia , Dor do Câncer/psicologia , Estudos Transversais , Depressão/fisiopatologia , Depressão/psicologia , Feminino , Nível de Saúde , Humanos , Masculino , Pessoa de Meia-Idade , Medição da Dor , Medidas de Resultados Relatados pelo Paciente , Valor Preditivo dos Testes , Reprodutibilidade dos TestesRESUMO
Background and purpose - Advancements in software and hardware have enabled the rise of clinical prediction models based on machine learning (ML) in orthopedic surgery. Given their growing popularity and their likely implementation in clinical practice we evaluated which outcomes these new models have focused on and what methodologies are being employed.Material and methods - We performed a systematic search in PubMed, Embase, and Cochrane Library for studies published up to June 18, 2020. Studies reporting on non-ML prediction models or non-orthopedic outcomes were excluded. After screening 7,138 studies, 59 studies reporting on 77 prediction models were included. We extracted data regarding outcome, study design, and reported performance metrics.Results - Of the 77 identified ML prediction models the most commonly reported outcome domain was medical management (17/77). Spinal surgery was the most commonly involved orthopedic subspecialty (28/77). The most frequently employed algorithm was neural networks (42/77). Median size of datasets was 5,507 (IQR 635-26,364). The median area under the curve (AUC) was 0.80 (IQR 0.73-0.86). Calibration was reported for 26 of the models and 14 provided decision-curve analysis.Interpretation - ML prediction models have been developed for a wide variety of topics in orthopedics. Topics regarding medical management were the most commonly studied. Heterogeneity between studies is based on study size, algorithm, and time-point of outcome. Calibration and decision-curve analysis were generally poorly reported.
Assuntos
Tomada de Decisão Clínica , Aprendizado de Máquina , Redes Neurais de Computação , Procedimentos Ortopédicos , Valor Preditivo dos Testes , HumanosRESUMO
Background and purpose - External validation of machine learning (ML) prediction models is an essential step before clinical application. We assessed the proportion, performance, and transparent reporting of externally validated ML prediction models in orthopedic surgery, using the Transparent Reporting for Individual Prognosis or Diagnosis (TRIPOD) guidelines.Material and methods - We performed a systematic search using synonyms for every orthopedic specialty, ML, and external validation. The proportion was determined by using 59 ML prediction models with only internal validation in orthopedic surgical outcome published up until June 18, 2020, previously identified by our group. Model performance was evaluated using discrimination, calibration, and decision-curve analysis. The TRIPOD guidelines assessed transparent reporting.Results - We included 18 studies externally validating 10 different ML prediction models of the 59 available ML models after screening 4,682 studies. All external validations identified in this review retained good discrimination. Other key performance measures were provided in only 3 studies, rendering overall performance evaluation difficult. The overall median TRIPOD completeness was 61% (IQR 43-89), with 6 items being reported in less than 4/18 of the studies.Interpretation - Most current predictive ML models are not externally validated. The 18 available external validation studies were characterized by incomplete reporting of performance measures, limiting a transparent examination of model performance. Further prospective studies are needed to validate or refute the myriad of predictive ML models in orthopedics while adhering to existing guidelines. This ensures clinicians can take full advantage of validated and clinically implementable ML decision tools.
Assuntos
Técnicas de Apoio para a Decisão , Aprendizado de Máquina/normas , Modelos Estatísticos , Procedimentos Ortopédicos , Humanos , Resultado do Tratamento , Estudos de Validação como AssuntoRESUMO
BACKGROUND: The widespread use of electronic patient-generated health data has led to unprecedented opportunities for automated extraction of clinical features from free-text medical notes. However, processing this rich resource of data for clinical and research purposes, depends on labor-intensive and potentially error-prone manual review. The aim of this study was to develop a natural language processing (NLP) algorithm for binary classification (single metastasis versus two or more metastases) in bone scintigraphy reports of patients undergoing surgery for bone metastases. MATERIAL AND METHODS: Bone scintigraphy reports of patients undergoing surgery for bone metastases were labeled each by three independent reviewers using a binary classification (single metastasis versus two or more metastases) to establish a ground truth. A stratified 80:20 split was used to develop and test an extreme-gradient boosting supervised machine learning NLP algorithm. RESULTS: A total of 704 free-text bone scintigraphy reports from 704 patients were included in this study and 617 (88%) had multiple bone metastases. In the independent test set (n = 141) not used for model development, the NLP algorithm achieved an 0.97 AUC-ROC (95% confidence interval [CI], 0.92-0.99) for classification of multiple bone metastases and an 0.99 AUC-PRC (95% CI, 0.99-0.99). At a threshold of 0.90, NLP algorithm correctly identified multiple bone metastases in 117 of the 124 who had multiple bone metastases in the testing cohort (sensitivity 0.94) and yielded 3 false positives (specificity 0.82). At the same threshold, the NLP algorithm had a positive predictive value of 0.97 and F1-score of 0.96. CONCLUSIONS: NLP has the potential to automate clinical data extraction from free text radiology notes in orthopedics, thereby optimizing the speed, accuracy, and consistency of clinical chart review. Pending external validation, the NLP algorithm developed in this study may be implemented as a means to aid researchers in tackling large amounts of data.
Assuntos
Algoritmos , Processamento de Linguagem Natural , Estudos de Coortes , Humanos , Valor Preditivo dos Testes , CintilografiaRESUMO
BACKGROUND: Machine learning (ML) is a subdomain of artificial intelligence that enables computers to abstract patterns from data without explicit programming. A myriad of impactful ML applications already exists in orthopaedics ranging from predicting infections after surgery to diagnostic imaging. However, no systematic reviews that we know of have compared, in particular, the performance of ML models with that of clinicians in musculoskeletal imaging to provide an up-to-date summary regarding the extent of applying ML to imaging diagnoses. By doing so, this review delves into where current ML developments stand in aiding orthopaedists in assessing musculoskeletal images. QUESTIONS/PURPOSES: This systematic review aimed (1) to compare performance of ML models versus clinicians in detecting, differentiating, or classifying orthopaedic abnormalities on imaging by (A) accuracy, sensitivity, and specificity, (B) input features (for example, plain radiographs, MRI scans, ultrasound), (C) clinician specialties, and (2) to compare the performance of clinician-aided versus unaided ML models. METHODS: A systematic review was performed in PubMed, Embase, and the Cochrane Library for studies published up to October 1, 2019, using synonyms for machine learning and all potential orthopaedic specialties. We included all studies that compared ML models head-to-head against clinicians in the binary detection of abnormalities in musculoskeletal images. After screening 6531 studies, we ultimately included 12 studies. We conducted quality assessment using the Methodological Index for Non-randomized Studies (MINORS) checklist. All 12 studies were of comparable quality, and they all clearly included six of the eight critical appraisal items (study aim, input feature, ground truth, ML versus human comparison, performance metric, and ML model description). This justified summarizing the findings in a quantitative form by calculating the median absolute improvement of the ML models compared with clinicians for the following metrics of performance: accuracy, sensitivity, and specificity. RESULTS: ML models provided, in aggregate, only very slight improvements in diagnostic accuracy and sensitivity compared with clinicians working alone and were on par in specificity (3% (interquartile range [IQR] -2.0% to 7.5%), 0.06% (IQR -0.03 to 0.14), and 0.00 (IQR -0.048 to 0.048), respectively). Inputs used by the ML models were plain radiographs (n = 8), MRI scans (n = 3), and ultrasound examinations (n = 1). Overall, ML models outperformed clinicians more when interpreting plain radiographs than when interpreting MRIs (17 of 34 and 3 of 16 performance comparisons, respectively). Orthopaedists and radiologists performed similarly to ML models, while ML models mostly outperformed other clinicians (outperformance in 7 of 19, 7 of 23, and 6 of 10 performance comparisons, respectively). Two studies evaluated the performance of clinicians aided and unaided by ML models; both demonstrated considerable improvements in ML-aided clinician performance by reporting a 47% decrease of misinterpretation rate (95% confidence interval [CI] 37 to 54; p < 0.001) and a mean increase in specificity of 0.048 (95% CI 0.029 to 0.068; p < 0.001) in detecting abnormalities on musculoskeletal images. CONCLUSIONS: At present, ML models have comparable performance to clinicians in assessing musculoskeletal images. ML models may enhance the performance of clinicians as a technical supplement rather than as a replacement for clinical intelligence. Future ML-related studies should emphasize how ML models can complement clinicians, instead of determining the overall superiority of one versus the other. This can be accomplished by improving transparent reporting, diminishing bias, determining the feasibility of implantation in the clinical setting, and appropriately tempering conclusions. LEVEL OF EVIDENCE: Level III, diagnostic study.
Assuntos
Competência Clínica , Aprendizado de Máquina , Imageamento por Ressonância Magnética , Doenças Musculoesqueléticas/diagnóstico por imagem , Sistema Musculoesquelético/diagnóstico por imagem , Cirurgiões Ortopédicos , Interpretação de Imagem Radiográfica Assistida por Computador , Ultrassonografia , Diagnóstico Diferencial , Humanos , Reconhecimento Automatizado de Padrão , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Percepção VisualRESUMO
BACKGROUND: The Skeletal Oncology Research Group (SORG) machine learning algorithm for predicting survival in patients with chondrosarcoma was developed using data from the Surveillance, Epidemiology, and End Results (SEER) registry. This algorithm was externally validated on a dataset of patients from the United States in an earlier study, where it demonstrated generally good performance but overestimated 5-year survival. In addition, this algorithm has not yet been validated in patients outside the United States; doing so would be important because external validation is necessary as algorithm performance may be misleading when applied in different populations. QUESTIONS/PURPOSES: Does the SORG algorithm retain validity in patients who underwent surgery for primary chondrosarcoma outside the United States, specifically in Italy? METHODS: A total of 737 patients were treated for chondrosarcoma between January 2000 and October 2014 at the Italian tertiary care center which was used for international validation. We excluded patients whose first surgical procedure was performed elsewhere (n = 25), patients who underwent nonsurgical treatment (n = 27), patients with a chondrosarcoma of the soft tissue or skull (n = 60), and patients with peripheral, periosteal, or mesenchymal chondrosarcoma (n = 161). Thus, 464 patients were ultimately included in this external validation study, as the earlier performed SEER study was used as the training set. Therefore, this study-unlike most of this type-does not have a training and validation set. Although the earlier study overestimated 5-year survival, we did not modify the algorithm in this report, as this is the first international validation and the prior performance in the single-institution validation study from the United States may have been driven by a small sample or non-generalizable patterns related to its single-center setting. Variables needed for the SORG algorithm were manually collected from electronic medical records. These included sex, age, histologic subtype, tumor grade, tumor size, tumor extension, and tumor location. By inputting these variables into the algorithm, we calculated the predicted probabilities of survival for each patient. The performance of the SORG algorithm was assessed in this study through discrimination (the ability of a model to distinguish between a binary outcome), calibration (the agreement of observed and predicted outcomes), overall performance (the accuracy of predictions), and decision curve analysis (establishment on the ability of a model to make a decision better than without using the model). For discrimination, the c-statistic (commonly known as the area under the receiver operating characteristic curve for binary classification) was calculated; this ranged from 0.5 (no better than chance) to 1.0 (excellent discrimination). The agreement between predicted and observed outcomes was visualized with a calibration plot, and the calibration slope and intercept were calculated. Perfect calibration results in a slope of 1 and an intercept of 0. For overall performance, the Brier score and the null-model Brier score were calculated. The Brier score ranges from 0 (perfect prediction) to 1 (poorest prediction). Appropriate interpretation of the Brier score requires comparison with the null-model Brier score. The null-model Brier score is the score for an algorithm that predicts a probability equal to the population prevalence of the outcome for every patient. A decision curve analysis was performed to compare the potential net benefit of the algorithm versus other means of decision support, such as treating all or none of the patients. There were several differences between this study and the earlier SEER study, and such differences are important because they help us to determine the performance of the algorithm in a group different from the initial study population. In this study from Italy, 5-year survival was different from the earlier SEER study (71% [319 of 450 patients] versus 76% [1131 of 1487 patients]; p = 0.03). There were more patients with dedifferentiated chondrosarcoma than in the earlier SEER study (25% [118 of 464 patients] versus 8.5% [131 of 1544 patients]; p < 0.001). In addition, in this study patients were older, tumor size was larger, and there were higher proportions of high-grade tumors than the earlier SEER study (age: 56 years [interquartile range {IQR} 42 to 67] versus 52 years [IQR 40 to 64]; p = 0.007; tumor size: 80 mm [IQR 50 to 120] versus 70 mm [IQR 42 to 105]; p < 0.001; tumor grade: 22% [104 of 464 had Grade 1], 42% [196 of 464 had Grade 2], and 35% [164 of 464 had Grade 3] versus 41% [592 of 1456 had Grade 1], 40% [588 of 1456 had Grade 2], and 19% [276 of 1456 had Grade 3]; p ≤ 0.001). RESULTS: Validation of the SORG algorithm in a primarily Italian population achieved a c-statistic of 0.86 (95% confidence interval 0.82 to 0.89), suggesting good-to-excellent discrimination. The calibration plot showed good agreement between the predicted probability and observed survival in the probability thresholds of 0.8 to 1.0. With predicted survival probabilities lower than 0.8, however, the SORG algorithm underestimated the observed proportion of patients with 5-year survival, reflected in the overall calibration intercept of 0.82 (95% CI 0.67 to 0.98) and calibration slope of 0.68 (95% CI 0.42 to 0.95). The Brier score for 5-year survival was 0.15, compared with a null-model Brier of 0.21. The algorithm showed a favorable decision curve analysis in the validation cohort. CONCLUSIONS: The SORG algorithm to predict 5-year survival for patients with chondrosarcoma held good discriminative ability and overall performance on international external validation; however, it underestimated 5-year survival for patients with predicted probabilities from 0 to 0.8 because the calibration plot was not perfectly aligned for the observed outcomes, which resulted in a maximum underestimation of 20%. The differences may reflect the baseline differences noted between the two study populations. The overall performance of the algorithm supports the utility of the algorithm and validation presented here. The freely available digital application for the algorithm is available here: https://sorg-apps.shinyapps.io/extremitymetssurvival/. LEVEL OF EVIDENCE: Level III, prognostic study.
Assuntos
Neoplasias Ósseas/mortalidade , Neoplasias Ósseas/cirurgia , Condrossarcoma/mortalidade , Condrossarcoma/cirurgia , Aprendizado de Máquina , Adulto , Feminino , Humanos , Itália , Masculino , Pessoa de Meia-Idade , Valor Preditivo dos TestesRESUMO
BACKGROUND: We developed a machine learning algorithm to predict the survival of patients with chondrosarcoma. The algorithm demonstrated excellent discrimination and calibration on internal validation in a derivation cohort based on data from the Surveillance, Epidemiology, and End Results (SEER) registry. However, the algorithm has not been validated in an independent external dataset. QUESTIONS/PURPOSES: Does the Skeletal Oncology Research Group (SORG) algorithm accurately predict 5-year survival in an independent patient population surgically treated for chondrosarcoma? METHODS: The SORG algorithm was developed using the SEER registry, which contains demographic data, tumor characteristics, treatment, and outcome values; and includes approximately 30% of the cancer patients in the United States. The SEER registry was ideal for creating the derivation cohort, and consequently the SORG algorithm, because of the high number of eligible patients and the availability of most (explanatory) variables of interest. Between 1992 to 2013, 326 patients were treated surgically for extracranial chondrosarcoma of the bone at two tertiary care referral centers. Of those, 179 were accounted for at a minimum of 5 years after diagnosis in a clinical note at one of the two institutions, unless they died earlier, and were included in the validation cohort. In all, 147 (45%) did not meet the minimum 5 years of followup at the institution and were not included in the validation of the SORG algorithm. The outcome (survival at 5 years) was checked for all 326 patients in the Social Security death index and were included in the supplemental validation cohort, to also ascertain validity for patients with less than 5 years of institutional followup. Variables used in the SORG algorithm to predict 5-year survival including sex, age, histologic subtype, tumor grade, tumor size, tumor extension, and tumor location were collected manually from medical records. The tumor characteristics were collected from the postoperative musculoskeletal pathology report. Predicted probabilities of 5-year survival were calculated for each patient in the validation cohort using the SORG algorithm, followed by an assessment of performance using the same metrics as used for internal validation, namely: discrimination, calibration, and overall performance. Discrimination was calculated using the concordance statistic (or the area under the Receiver Operating Characteristic (ROC) curve) to determine how well the algorithm discriminates between the outcome, which ranges from 0.5 (no better than a coin-toss) to 1.0 (perfect discrimination). Calibration was assessed using the calibration slope and intercept from a calibration plot to measure the agreement between predicted and observed outcomes. A perfect calibration plot should show a 45° upwards line. Overall performance was determined using the Brier score, ranging from 0 (excellent prediction) to 1 (worst prediction). The Brier score was compared with the null-model Brier score, which showed the performance of a model that ignored all the covariates. A Brier score lower than the null model Brier score indicated greater performance of the algorithm. For the external validation an F1-score was added to measure the overall accuracy of the algorithm, which ranges between 0 (total failure of an algorithm) and 1 (perfect algorithm).The 5-year survival was lower in the validation cohort than it was in the derivation cohort from SEER (61.5% [110 of 179] versus 76% [1131 of 1544] ; p < 0.001). This difference was driven by higher proportion of dedifferentiated chondrosarcoma in the institutional population than in the derivation cohort (27% [49 of 179] versus 9% [131 of 1544]; p < 0.001). Patients in the validation cohort also had larger tumor sizes, higher grades, and nonextremity tumor locations than did those in the derivation cohort. These differences between the study groups emphasize that the external validation is performed not only in a different patient cohort, but also in terms of disease characteristics. Five-year survival was not different for both patient groups between subpopulations of patients with conventional chondrosarcomas and those with dedifferentiated chondrosarcomas. RESULTS: The concordance statistic for the validation cohort was 0.87 (95% CI, 0.80-0.91). Evaluation of the algorithm's calibration in the institutional population resulted in a calibration slope of 0.97 (95% CI, 0.68-1.3) and calibration intercept of -0.58 (95% CI, -0.20 to -0.97). Finally, on overall performance, the algorithm had a Brier score of 0.152 compared with a null-model Brier score of 0.237 for a high level of overall performance. The F1-score was 0.836. For the supplementary validation in the total of 326 patients, the SORG algorithm had a validation of 0.89 (95% CI, 0.85-0.93). The calibration slope was 1.13 (95% CI, 0.87-1.39) and the calibration intercept was -0.26 (95% CI, -0.57 to 0.06). The Brier score was 0.11, with a null-model Brier score of 0.19. The F1-score was 0.901. CONCLUSIONS: On external validation, the SORG algorithm retained good discriminative ability and overall performance but overestimated 5-year survival in patients surgically treated for chondrosarcoma. This internet-based tool can help guide patient counseling and shared decision making. LEVEL OF EVIDENCE: Level III, prognostic study.
Assuntos
Algoritmos , Neoplasias Ósseas/mortalidade , Condrossarcoma/mortalidade , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Prognóstico , Estudos Retrospectivos , Taxa de Sobrevida , Fatores de TempoRESUMO
The purpose of this study was to assess the value of body composition measures obtained from opportunistic abdominal computed tomography (CT) in order to predict hospital length of stay (LOS), 30-day postoperative complications, and reoperations in patients undergoing surgery for spinal metastases. 196 patients underwent CT of the abdomen within three months of surgery for spinal metastases. Automated body composition segmentation and quantifications of the cross-sectional areas (CSA) of abdominal visceral and subcutaneous adipose tissue and abdominal skeletal muscle was performed. From this, 31% (61) of patients had postoperative complications within 30 days, and 16% (31) of patients underwent reoperation. Lower muscle CSA was associated with increased postoperative complications within 30 days (OR [95% CI] = 0.99 [0.98-0.99], p = 0.03). Through multivariate analysis, it was found that lower muscle CSA was also associated with an increased postoperative complication rate after controlling for the albumin, ASIA score, previous systemic therapy, and thoracic metastases (OR [95% CI] = 0.99 [0.98-0.99], p = 0.047). LOS and reoperations were not associated with any body composition measures. Low muscle mass may serve as a biomarker for the prediction of complications in patients with spinal metastases. The routine assessment of muscle mass on opportunistic CTs may help to predict outcomes in these patients.
RESUMO
BACKGROUND: The outcome differences following surgery for an impending versus a completed pathological fracture have not been clearly defined. The purpose of the present study was to assess differences in outcomes following the surgical treatment of impending versus completed pathological fractures in patients with long-bone metastases in terms of (1) 90-day and 1-year survival and (2) intraoperative blood loss, perioperative blood transfusion, anesthesia time, duration of hospitalization, 30-day postoperative systemic complications, and reoperations. METHODS: We retrospectively performed a matched cohort study utilizing a database of 1,064 patients who had undergone operative treatment for 462 impending and 602 completed metastatic long-bone fractures. After matching on 22 variables, including primary tumor, visceral metastases, and surgical treatment, 270 impending pathological fractures were matched to 270 completed pathological fractures. The primary outcome was assessed with the Cox proportional hazard model. The secondary outcomes were assessed with the McNemar test and the Wilcoxon signed-rank test. RESULTS: The 90-day survival rate did not differ between the groups (HR, 1.13 [95% CI, 0.81 to 1.56]; p = 0.48), but the 1-year survival rate was worse for completed pathological fractures (46% versus 38%) (HR, 1.28 [95% CI, 1.02 to 1.61]; p = 0.03). With regard to secondary outcomes, completed pathological fractures were associated with higher intraoperative estimated blood loss (p = 0.03), a higher rate of perioperative blood transfusions (p = 0.01), longer anesthesia time (p = 0.04), and more reoperations (OR, 2.50 [95% CI, 1.92 to 7.86]; p = 0.03); no differences were found in terms of the rate of 30-day postoperative complications or the duration of hospitalization. CONCLUSIONS: Patients undergoing surgery for impending pathological fractures had lower 1-year mortality rates and better secondary outcomes as compared with patients undergoing surgery for completed pathological fractures when accounting for 22 covariates through propensity matching. Patients with an impending pathological fracture appear to benefit from prophylactic stabilization as stabilizing a completed pathological fracture seems to be associated with increased mortality, blood loss, rate of blood transfusions, duration of surgery, and reoperation risk. LEVEL OF EVIDENCE: Prognostic Level III. See Instructions for Authors for a complete description of levels of evidence.
Assuntos
Neoplasias Ósseas/cirurgia , Fraturas Espontâneas/cirurgia , Idoso , Neoplasias Ósseas/complicações , Neoplasias Ósseas/mortalidade , Estudos de Coortes , Bases de Dados Factuais , Feminino , Fraturas Espontâneas/etiologia , Fraturas Espontâneas/mortalidade , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Taxa de Sobrevida , Resultado do TratamentoRESUMO
INTRODUCTION: Body composition assessed using opportunistic CT has been recently identified as a predictor of outcome in patients with cancer. The purpose of this study was to determine whether the cross-sectional area (CSA) and the attenuation of abdominal subcutaneous adipose tissue, visceral adipose tissue (VAT), and paraspinous and abdominal muscles are the predictors of length of hospital stay, 30-day postoperative complications, and revision surgery in patients treated for long bone metastases. METHODS: A retrospective database of patients who underwent surgery for long bone metastases from 1999 to 2017 was used to identify 212 patients who underwent preoperative abdominal CT. CSA and attenuation measurements for subcutaneous adipose tissue, VAT, and muscles were taken at the level of L4 with the aid of an in-house segmentation algorithm. Bivariate and multivariate linear and logistic regression models were created to determine associations between body composition measurements and outcomes while controlling for confounders, including primary tumor, metastasis location, and preoperative albumin. RESULTS: On multivariate analysis, increased VAT CSA {regression coefficient (r) (95% confidence interval [CI]); 0.01 (0.01 to 0.02); P < 0.01} and decreased muscle attenuation (r [95% CI] -0.07 [-0.14 to -0.01]; P = 0.04) were associated with an increased length of hospital stay. In bivariate analysis, increased muscle CSA was associated with increased chance of revision surgery (odds ratio [95% CI]; 1.02 [1.01 to 1.03]; P = 0.04). No body composition measurements were associated with postoperative complications within 30 days. DISCUSSION: Body composition measurements assessed using opportunistic CT predict adverse postoperative outcomes in patients operated for long bone metastases.
Assuntos
Composição Corporal , Neoplasias Ósseas , Neoplasias Ósseas/diagnóstico por imagem , Neoplasias Ósseas/cirurgia , Humanos , Gordura Intra-Abdominal/diagnóstico por imagem , Complicações Pós-Operatórias/etiologia , Estudos RetrospectivosRESUMO
BACKGROUND CONTEXT: Although survival of patients with spinal metastases has improved over the last decades due to advances in multi-modal therapy, there are currently no reliable predictors of mortality. Body composition measurements obtained using computed tomography (CT) have been recently proposed as biomarkers for survival in patients with and without cancer. Patients with cancer routinely undergo CT for staging or surveillance of therapy. Body composition assessed using opportunistic CTs might be used to determine survival in patients with spinal metastases. PURPOSE: The purpose of this study was to determine the value of body composition measures obtained on opportunistic abdomen CTs to predict 90-day and 1-year mortality in patients with spinal metastases undergoing surgery. We hypothesized that low muscle and abdominal fat mass were positive predictors of mortality. STUDY DESIGN: Retrospective study at a single tertiary care center in the United States. PATIENT SAMPLE: This retrospective study included 196 patients between 2001 and 2016 that were 18 years of age or older, underwent surgical treatment for spinal metastases, and had a preoperative CT of the abdomen within three months prior to surgery. OUTCOME MEASURES: Ninety-day and 1-year mortality by any cause. METHODS: Quantification of cross-sectional areas (CSA) and CT attenuation of abdominal subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), and paraspinous and abdominal skeletal muscle were performed on CT images at the level of L4 using an in-house automated algorithm. Sarcopenia was determined by total muscle CSA (cm2) divided by height squared (m2) with cutoff values of <52.4 cm2/m2 for men and <38.5 cm2/m2 for women. Bivariate and multivariate Cox proportional-hazard analyses were used to determine the associations between body compositions and 90-day and 1-year mortality. RESULTS: The median age was 62 years (interquartile range=53-70). The mortality rate for 90-day was 24% and 1-year 54%. The presence of sarcopenia was associated with an increased 1-year mortality rate of 66% compared with a 1-year mortality rate of 41% in patients without sarcopenia (hazard ratio, 1.68; 95% confidence interval, 1.08-2.61; p=.02) after adjusting for various clinical factors including primary tumor type, ECOG performance status, additional metastases, neurology status, and systemic therapy. Additional analysis showed an association between sarcopenia and increased 1-year mortality when controlling for the prognostic modified Bauer score (HR, 1.58; 95%CI, 1.04-2.40; p=.03). Abdominal fat CSAs or muscle attenuation were not independently associated with mortality. CONCLUSIONS: The presence of sarcopenia is associated with an increased risk of 1-year mortality for patients surgically treated for spinal metastases. Sarcopenia retained an independent association with mortality when controlling for the prognostic modified Bauer score. This implies that body composition measurements such as sarcopenia could serve as novel biomarkers for prediction of mortality and may supplement other existing prognostic tools to improve shared decision making for patients with spinal metastases that are contemplating surgical treatment.
Assuntos
Sarcopenia , Neoplasias da Coluna Vertebral , Adolescente , Adulto , Composição Corporal , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Músculo Esquelético/diagnóstico por imagem , Músculo Esquelético/patologia , Prognóstico , Estudos Retrospectivos , Sarcopenia/complicações , Sarcopenia/diagnóstico por imagem , Neoplasias da Coluna Vertebral/complicações , Neoplasias da Coluna Vertebral/diagnóstico por imagem , Neoplasias da Coluna Vertebral/cirurgia , Tomografia Computadorizada por Raios XRESUMO
STUDY DESIGN: This was a systematic review and meta-analysis. OBJECTIVE: A systematic review and meta-analysis was conducted to assess the quality of life (QoL) after open surgery for spinal metastases, and how surgery affects physical, social/family, emotional, and functional well-being. SUMMARY OF BACKGROUND DATA: It remains questionable to what extent open surgery improves QoL for metastatic spinal disease, it would be interesting to quantify the magnitude and duration of QoL benefits-if any-after surgery for spinal metastases. MATERIALS AND METHODS: Included were studies measuring QoL before and after nonpercutaneous, open surgery for spinal metastases for various indications including pain, spinal cord compression, instability, or tumor control. A random-effect model assessed standardized mean differences (SMDs) of summary QoL scores between baseline and 1, 3, 6, or 9-12 months after surgery. RESULTS: The review yielded 10 studies for data extraction. The pooled QoL summary score improved from baseline to 1 month (SMD=1.09, P<0.001), to 3 months (SMD=1.28, P<0.001), to 6 months (SMD=1.21, P<0.001), and to 9-12 months (SMD=1.08, P=0.001). The surgery improved physical well-being during the first 3 months (SMD=0.94, P=0.022), improved emotional (SMD=1.19, P=0.004), and functional well-being (SMD=1.08, P=0.005) during the first 6 months, and only improved social/family well-being at month 6 (SMD=0.28, P=0.001). CONCLUSIONS: The surgery improved QoL for patients with spinal metastases, and rapidly improved physical, emotional, and functional well-being; it had minimal effect on social/family well-being. However, choosing the optimal candidate for surgical intervention in the setting of spinal metastases remains paramount: otherwise postoperative morbidity and complications may outbalance the intended benefits of surgery. Future research should report clear definitions of selection criteria and surgical indication and provide stratified QoL results by indication and clinical characteristics such as primary tumor type, preoperative Karnofsky, and Bilsky scores to elucidate the optimal candidate for surgical intervention.
Assuntos
Neoplasias , Compressão da Medula Espinal , Doenças da Coluna Vertebral , Humanos , Qualidade de VidaRESUMO
BACKGROUND CONTEXT: Preoperative embolization (PE) reduces intraoperative blood loss during surgery for spinal metastases of hypervascular primary tumors such as thyroid and renal cell tumors. However, most spinal metastases originate from primary breast, prostate, and lung tumors and it remains unclear whether these and other spinal metastases benefit from PE. PURPOSE: To assess the (1) efficacy of PE on the amount of intraoperative blood loss and safety in patients with spinal metastases originating from non-hypervascular primary tumors, and (2) secondary outcomes including perioperative allogeneic blood transfusion, anesthesia time, hospitalization, postoperative complication within 30 days, reoperation, 90-day mortality, and 1-year mortality. STUDY DESIGN: Retrospective propensity-score matched, case-control study at 2 academic tertiary medical centers. PATIENT SAMPLE: Patients 18 years of age or older undergoing surgery for spinal metastases originating from primary non-thyroid, non-renal cell, and non-hepatocellular tumors between January 1, 2002 and December 31, 2016 were included. OUTCOME MEASURES: The primary outcomes were estimated amount of intraoperative blood loss and complications attributable to PE, such as neurologic injury, wound infection, thrombosis, or dissection. The secondary outcomes included perioperative allogeneic blood transfusion, anesthesia time, hospitalization, postoperative complication within 30 days, reoperation, 90-day mortality, and 1-year mortality. METHODS: In total, 495 patients were identified, of which 54 (11%) underwent PE. After propensity score matching on 21 variables, including primary tumor, number of spinal levels, and surgical treatment, 53 non-PE patients were matched to 53 PE patients. Matching was adequate measured by comparing the matched variables, testing the standardized mean differences (<0.25), and inspecting Kernel density plots. The degree of embolization was noted to be complete, until stasis, or successful in 43 (80%) patients. RESULTS: Intraoperative blood loss did not differ between both groups with a median blood loss in liters of 0.6 (IQR, 0.4-1.2) for non-PE patients and 0.9 (IQR, 0.6-1.2) for PE patients (p=.32). No complications occurred during embolization or the time between embolization and surgery. No differences were found in terms of the secondary outcomes. CONCLUSIONS: Our data suggest that, although no complications occurred and the embolization procedure can be considered safe, patients with non-hypervascular spinal metastases might not benefit from PE. A larger, prospective study could confirm or refute these study findings and aid in elucidating a subset of spinal metastases that might benefit from PE.
Assuntos
Embolização Terapêutica , Neoplasias Renais , Neoplasias da Coluna Vertebral , Adolescente , Adulto , Perda Sanguínea Cirúrgica/prevenção & controle , Estudos de Casos e Controles , Embolização Terapêutica/efeitos adversos , Embolização Terapêutica/métodos , Humanos , Neoplasias Renais/complicações , Neoplasias Renais/cirurgia , Masculino , Complicações Pós-Operatórias , Cuidados Pré-Operatórios/métodos , Pontuação de Propensão , Estudos Prospectivos , Estudos Retrospectivos , Neoplasias da Coluna Vertebral/secundário , Resultado do TratamentoRESUMO
Machine learning (ML) studies are becoming increasingly popular in orthopedics but lack a critically appraisal of their adherence to peer-reviewed guidelines. The objective of this review was to (1) evaluate quality and transparent reporting of ML prediction models in orthopedic surgery based on the transparent reporting of multivariable prediction models for individual prognosis or diagnosis (TRIPOD), and (2) assess risk of bias with the Prediction model Risk Of Bias ASsessment Tool. A systematic review was performed to identify all ML prediction studies published in orthopedic surgery through June 18th, 2020. After screening 7138 studies, 59 studies met the study criteria and were included. Two reviewers independently extracted data and discrepancies were resolved by discussion with at least two additional reviewers present. Across all studies, the overall median completeness for the TRIPOD checklist was 53% (interquartile range 47%-60%). The overall risk of bias was low in 44% (n = 26), high in 41% (n = 24), and unclear in 15% (n = 9). High overall risk of bias was driven by incomplete reporting of performance measures, inadequate handling of missing data, and use of small datasets with inadequate outcome numbers. Although the number of ML studies in orthopedic surgery is increasing rapidly, over 40% of the existing models are at high risk of bias. Furthermore, over half incompletely reported their methods and/or performance measures. Until these issues are adequately addressed to give patients and providers trust in ML models, a considerable gap remains between the development of ML prediction models and their implementation in orthopedic practice.
Assuntos
Procedimentos Ortopédicos , Ortopedia , Viés , Humanos , Aprendizado de Máquina , PrognósticoRESUMO
BACKGROUND: Anterior lumbar spine surgery (ALSS) requires mobilization of the great vessels, resulting in a high risk of iatrogenic vascular injury (VI). It remains unclear whether VI is associated with increased risk of postoperative complications and other related adverse outcomes. PURPOSE: The purpose of this study was to (1) assess the incidence of postoperative complications attributable to VI during ALSS, and (2) outcomes secondary to VI such as procedural blood loss, transfusion of blood products, length of stay (LOS), and in hospital mortality. STUDY DESIGN: Retrospective propensity-score matched, case-control study at 2 academic and 3 community medical centers, PATIENT SAMPLE: Patients 18 years of age or older, undergoing ALSS between January 1st, 2000 and July 31st, 2019 were included in this analysis. OUTCOME MEASURES: The primary outcome was the incidence of postoperative complications attributable to VI, such as venous thromboembolism, compartment syndrome, transfusion reaction, limb ischemia, and reoperations. The secondary outcomes included estimated operative blood loss (milliliter), transfused blood products, LOS (days), and in-hospital mortality. METHODS: In total, 1,035 patients were identified, of which 75 (7.2%) had a VI. For comparative analyses, the 75 VI patients were paired with 75 comparable non-VI patients by propensity-score matching. The adequacy of the matching was assessed by testing the standardized mean differences (SMD) between VI and non-VI group (>0.25 SMD). RESULTS: Two patients (2.7%) had VI-related postoperative complications in the studied period, which consisted of two deep venous thromboembolisms (DVTs) occurring on day 3 and 7 postoperatively. Both DVTs were located in the distal left common iliac vein (CIV). The VI these patients suffered were to the distal inferior vena cava and the left CIV, respectively. Both patients did not develop additional complications in consequence of their DVTs, however, did require systemic anticoagulation and placement of an inferior vena cava filter. There was no statistical difference with the non-VI group where no instances (0%) of postoperative complications were reported (p=.157). No differences were found in LOS or in hospital mortality between the two groups (p=.157 and p=.999, respectively). Intraoperative blood loss and blood transfusion were both found to be higher in the VI group in comparison to the non-VI group (650 mL, interquartile range [IQR] 300-1400 vs. 150 mL, IQR 50-425, p≤.001; 0 units, IQR 0-3 vs. 0 units, IQR 0-1, p=.012, respectively). CONCLUSION: This study found a low number of serious postoperative complications related to VI in ALSS. In addition, these complications were not significantly different between the VI and matched non-VI ALSS cohort. Although not significant, the found DVT incidence of 2.7% after VI in ALSS warrants vigilance and preventive measures during the postoperative course of these patients.
Assuntos
Lesões do Sistema Vascular , Adolescente , Adulto , Estudos de Casos e Controles , Humanos , Doença Iatrogênica/epidemiologia , Tempo de Internação , Complicações Pós-Operatórias/epidemiologia , Período Pós-Operatório , Estudos Retrospectivos , Lesões do Sistema Vascular/epidemiologia , Lesões do Sistema Vascular/etiologiaRESUMO
BACKGROUND CONTEXT: Accurately predicting the survival of patients with spinal metastases is important for guiding surgical intervention. The SORG machine-learning (ML) algorithm for the 90-day and one-year mortality of patients with metastatic cancer to the spine has been multiply validated, with a high degree of accuracy in both internal and external validation studies. However, prior external validations were conducted using patient groups located on the east coast of the United States, representing a generally homogeneous population. The aim of this study was to externally validate the SORG algorithms with a Taiwanese population. STUDY DESIGN/SETTING: Retrospective study at a single tertiary care center in Taiwan PATIENT SAMPLE: Four hundred and twenty-seven patients who underwent surgery for metastatic spine disease from November 1, 2010 to December 31, 2018 OUTCOME MEASURES: 90-day and one-year mortality METHODS: The baseline characteristics of our validation cohort were compared with those of the previously published developmental and external validation cohorts. Discrimination (c-statistic and receiver operating curve), calibration (calibration plot, intercept, and slope), overall performance (Brier score), and decision curve analysis were used to assess the performance of the SORG ML algorithms in this cohort. RESULTS: Ninety-day and one-year mortality rates were 110 of 427 (26%) and 256 of 427 (60%), respectively. The external validation cohort and the developmental cohort differed in body mass index (BMI), preoperative performance status, American Spinal Injury Association impairment scale, primary tumor histology and in several laboratory measurements. The SORG ML algorithm for 90-day and 1-year mortality demonstrated a high level of discriminative ability (c-statistics of 0.73 [95% confidence interval [CI], 0.67-0.78] and 0.74 [95% CI, 0.69-0.79]), overall performance, and had a positive net benefit throughout the range of threshold probabilities in decision curve analysis. The algorithm for 1-year mortality had a calibration intercept of 0.08, representing a good calibration. However, the 90-day mortality algorithm underestimated mortality for the lowest predicted probabilities, with an overall intercept of 0.81. CONCLUSIONS: The SORG algorithms for predicting 90-day and 1-year mortality in patients with spinal metastatic disease generally performed well on international external validation in a predominately Taiwanese population. However, 90-day mortality was underestimated in this group. Whether this inconsistency was due to different primary tumor characteristics, body mass index, selection bias or other factors remains unclear, and may be better understood with further validative works that utilize international and/or diverse populations.
Assuntos
Algoritmos , Aprendizado de Máquina , Humanos , Estudos Retrospectivos , Coluna Vertebral , Taiwan/epidemiologiaRESUMO
BACKGROUND: Intraoperative vascular injury (VI) may be an unavoidable complication of anterior lumbar spine surgery; however, vascular injury has implications for quality and safety reporting as this intraoperative complication may result in serious bleeding, thrombosis, and postoperative stricture. PURPOSE: The purpose of this study was to (1) develop machine learning algorithms for preoperative prediction of VI and (2) develop natural language processing (NLP) algorithms for automated surveillance of intraoperative VI from free-text operative notes. PATIENT SAMPLE: Adult patients, 18 years or age or older, undergoing anterior lumbar spine surgery at two academic and three community medical centers were included in this analysis. OUTCOME MEASURES: The primary outcome was unintended VI during anterior lumbar spine surgery. METHODS: Manual review of free-text operative notes was used to identify patients who had unintended VI. The available population was split into training and testing cohorts. Five machine learning algorithms were developed for preoperative prediction of VI. An NLP algorithm was trained for automated detection of intraoperative VI from free-text operative notes. Performance of the NLP algorithm was compared to current procedural terminology and international classification of diseases codes. RESULTS: In all, 1035 patients underwent anterior lumbar spine surgery and the rate of intraoperative VI was 7.2% (n=75). Variables used for preoperative prediction of VI were age, male sex, body mass index, diabetes, L4-L5 exposure, and surgery for infection (discitis, osteomyelitis). The best performing machine learning algorithm achieved c-statistic of 0.73 for preoperative prediction of VI (https://sorg-apps.shinyapps.io/lumbar_vascular_injury/). For automated detection of intraoperative VI from free-text notes, the NLP algorithm achieved c-statistic of 0.92. The NLP algorithm identified 18 of the 21 patients (sensitivity 0.86) who had a VI whereas current procedural terminologyand international classification of diseases codes identified 6 of the 21 (sensitivity 0.29) patients. At this threshold, the NLP algorithm had a specificity of 0.93, negative predictive value of 0.99, positive predictive value of 0.51, and F1-score of 0.64. CONCLUSION: Relying on administrative procedural and diagnosis codes may underestimate the rate of unintended intraoperative VI in anterior lumbar spine surgery. External and prospective validation of the algorithms presented here may improve quality and safety reporting.