RESUMO
OBJECTIVE: Neighborhood-level resource disadvantage has been previously shown to predict extent of resection, oncological follow-up, adjuvant treatment, and clinical trial participation for malignancies, including glioblastoma. The authors aimed to characterize the association between neighborhood disadvantage and long-term outcomes after spine tumor surgery. METHODS: The authors analyzed all patients who underwent surgery for primary or secondary (all metastatic pathologies) spine tumors at a single spinal oncology specialty center in the United States from 2015 to 2022. The Area Deprivation Index (ADI), a validated metric compositing 17 social determinants of health variables that ranges continuously from 0% (higher advantage) to 100% (higher disadvantage), was used to quantify neighborhood disadvantage. Patient addresses were matched to ADI on the basis of the census block of residence. Subsequently, the study population was dichotomized into advantaged (ADI 0%-33%) and disadvantaged (ADI 34%-100%) cohorts. The primary endpoint was functional status, as defined by Eastern Cooperative Oncology Group (ECOG) Performance Status Scale grade, with secondary endpoints including inpatient outcomes, mortality, readmissions, reoperations, and clinical research participation. Multivariable logistic, gamma log-link, and Cox regression adjusted for 14 confounders, including patient and oncological characteristics, general and tumor-related presenting severity, and treatment. RESULTS: In total, 237 patients underwent spine tumor surgery from 2015 to 2022, with an average age of 53.9 years, and 57.0% had primary tumors whereas 43.0% had secondary tumors; 55.3% (n = 131) were classified by ADI into the disadvantaged cohort. This cohort had higher rates of ambulation deficits on presentation (39.1% vs 23.5%, p = 0.015) and nonelective surgery (35.1% vs 23.6%, p = 0.030). Postoperatively, disadvantaged patients exhibited higher odds of residual tumor (OR 2.55, p = 0.026), especially for secondary tumors (OR 4.92, p = 0.045). Patients from disadvantaged neighborhoods additionally exhibited significantly higher odds of poor functional status at follow-up (OR 3.94, p = 0.002). Postoperative survival was 74.7% (mean follow-up 17.6 months), with the disadvantaged cohort experiencing significantly shorter survival (HR 1.92, p = 0.049). Moreover, this population had higher odds of readmission (OR 1.92, p = 0.046) and, for primary tumors, reoperation (OR 9.26, p = 0.005). Elective participation in prospective clinical research was lower among the disadvantaged cohort (OR 0.45, p = 0.016). CONCLUSIONS: Neighborhood disadvantage predicts higher rates of residual tumor, readmission, and reoperation, as well as poorer functional status, shorter postoperative survival, and decreased elective research participation. The ADI may be used to risk stratify spine oncology patients and guide targeted interventions to ameliorate neurosurgical disparities and to reduce barriers to research participation.
RESUMO
OBJECTIVE: Earlier research has demonstrated that social determinants of health (SDoH) impact neurosurgical access and outcomes, but these trends are less characterized for spine tumors relative to intracranial tumors. The authors aimed to elucidate the association between SDoH and outcomes for a nationwide cohort of spine tumor surgery admissions. METHODS: The authors identified all admissions with a spine tumor diagnosis in the National Inpatient Sample (NIS) from 2002 to 2019. Four SDoH were analyzed: race and ethnicity, insurance, household income, and safety-net hospital (SNH) treatment. Hospitals in the top quartile of safety-net burden (in terms of percentage of patients receiving Medicaid or uninsured) were categorized as SNHs. Multivariable regression queried the association between 22 variables and 5 perioperative outcomes: mortality, discharge disposition, complications, length of stay (LOS), and hospitalization costs. Interaction term analysis with hospitalization year was used to assess longitudinal changes in outcome disparities. Finally, the authors constructed random forest machine learning models to assess the impact of SDoH variables on prognostic accuracy and to quantify the relative importance of predictors for disposition. RESULTS: Of 6,593,392 total admissions with spine tumors, 219,380 (3.3%) underwent surgery. Non-White race (OR 0.80-0.91, p < 0.001) and nonprivate insurance (OR 0.76-0.83, p < 0.001) were associated with lower odds of receiving surgery. Among surgical admissions, presenting severity, including of myelopathy and plegia, was elevated among non-White, nonprivate insurance, and low-income admissions (all p < 0.001). Black race (OR 0.70, p < 0.001), Medicare (OR 0.70, p < 0.001), Medicaid (OR 0.90, p < 0.001), and lower income (OR 0.88-0.93, all p < 0.001) were associated with decreased odds of favorable discharge disposition. Increased LOS and costs were observed among non-White (+6%-10% in LOS and +5%-9% in costs, both p < 0.001) and Medicaid (+16% in LOS and +6% in costs, both p < 0.001) admissions. SNH treatment was also associated with higher mortality (OR 1.49, p < 0.001) and complication (OR 1.20, p < 0.001) rates. From 2002 to 2019, disposition improved annually for Medicaid patients (OR 1.03 per year, p = 0.022) but worsened for Black patients (OR 0.98 per year, p = 0.046). Random forest models identified household income as the most important predictor of discharge disposition. CONCLUSIONS: For spine tumor admissions, SDoH predicted surgical intervention, presenting severity, and perioperative outcomes. Over 2 decades, disparities improved for Medicaid patients but worsened for Black patients. Finally, SDoH significantly improve prognostic accuracy for outcomes after spine tumor surgery. Further study toward ameliorating patient disparities for this population is warranted.
RESUMO
OBJECTIVE: Contemporary management of sacral chordomas requires maximizing the potential for recurrence-free and overall survival while minimizing treatment morbidity. En bloc resection can be performed at various levels of the sacrum, with tumor location and volume ultimately dictating the necessary extent of resection and subsequent tissue reconstruction. Because tumor resection involving the upper sacrum may be quite destabilizing, other pertinent considerations relate to instrumentation and subsequent tissue reconstruction. The primary aim of this study was to survey the surgical approaches used for managing primary sacral chordoma according to location of lumbosacral spine involvement, including a narrative review of the literature and examination of the authors' institutional case series. METHODS: The authors performed a narrative review of pertinent literature regarding reconstruction and complication avoidance techniques following en bloc resection of primary sacral tumors, supplemented by a contemporary series of 11 cases from their cohort. Relevant surgical anatomy, advances in instrumentation and reconstruction techniques, intraoperative imaging and navigation, soft-tissue reconstruction, and wound complication avoidance are also discussed. RESULTS: The review of the literature identified several surgical approaches used for management of primary sacral chordoma localized to low sacral levels (mid-S2 and below), high sacral levels (involving upper S2 and above), and high sacral levels with lumbar involvement. In the contemporary case series, the majority of cases (8/11) presented as low sacral tumors that did not require instrumentation. A minority required more extensive instrumentation and reconstruction, with 2 tumors involving upper S2 and/or S1 levels and 1 tumor extending into the lower lumbar spine. En bloc resection was successfully achieved in 10 of 11 cases, with a colostomy required in 2 cases due to rectal involvement. All 11 cases underwent musculocutaneous flap wound closure by plastic surgery, with none experiencing wound complications requiring revision. CONCLUSIONS: The modern management of sacral chordoma involves a multidisciplinary team of surgeons and intraoperative technologies to minimize surgical morbidity while optimizing oncological outcomes through en bloc resection. Most cases present with lower sacral tumors not requiring instrumentation, but stabilizing instrumentation and lumbosacral reconstruction are often required in upper sacral and lumbosacral cases. Among efforts to minimize wound-related complications, musculocutaneous flap closure stands out as an evidence-based measure that may mitigate risk.
Assuntos
Cordoma , Sacro , Neoplasias da Coluna Vertebral , Humanos , Cordoma/cirurgia , Cordoma/diagnóstico por imagem , Cordoma/patologia , Sacro/cirurgia , Sacro/diagnóstico por imagem , Neoplasias da Coluna Vertebral/cirurgia , Neoplasias da Coluna Vertebral/diagnóstico por imagem , Neoplasias da Coluna Vertebral/patologia , Masculino , Pessoa de Meia-Idade , Feminino , Idoso , Adulto , Procedimentos de Cirurgia Plástica/métodosRESUMO
BACKGROUND AND OBJECTIVES: Interest surrounding generative large language models (LLMs) has rapidly grown. Although ChatGPT (GPT-3.5), a general LLM, has shown near-passing performance on medical student board examinations, the performance of ChatGPT or its successor GPT-4 on specialized examinations and the factors affecting accuracy remain unclear. This study aims to assess the performance of ChatGPT and GPT-4 on a 500-question mock neurosurgical written board examination. METHODS: The Self-Assessment Neurosurgery Examinations (SANS) American Board of Neurological Surgery Self-Assessment Examination 1 was used to evaluate ChatGPT and GPT-4. Questions were in single best answer, multiple-choice format. χ 2 , Fisher exact, and univariable logistic regression tests were used to assess performance differences in relation to question characteristics. RESULTS: ChatGPT (GPT-3.5) and GPT-4 achieved scores of 73.4% (95% CI: 69.3%-77.2%) and 83.4% (95% CI: 79.8%-86.5%), respectively, relative to the user average of 72.8% (95% CI: 68.6%-76.6%). Both LLMs exceeded last year's passing threshold of 69%. Although scores between ChatGPT and question bank users were equivalent ( P = .963), GPT-4 outperformed both (both P < .001). GPT-4 answered every question answered correctly by ChatGPT and 37.6% (50/133) of remaining incorrect questions correctly. Among 12 question categories, GPT-4 significantly outperformed users in each but performed comparably with ChatGPT in 3 (functional, other general, and spine) and outperformed both users and ChatGPT for tumor questions. Increased word count (odds ratio = 0.89 of answering a question correctly per +10 words) and higher-order problem-solving (odds ratio = 0.40, P = .009) were associated with lower accuracy for ChatGPT, but not for GPT-4 (both P > .005). Multimodal input was not available at the time of this study; hence, on questions with image content, ChatGPT and GPT-4 answered 49.5% and 56.8% of questions correctly based on contextual context clues alone. CONCLUSION: LLMs achieved passing scores on a mock 500-question neurosurgical written board examination, with GPT-4 significantly outperforming ChatGPT.
Assuntos
Neurocirurgia , Humanos , Procedimentos Neurocirúrgicos , Razão de Chances , Autoavaliação (Psicologia) , Coluna VertebralRESUMO
Importance: Use of lumbar fusion has increased substantially over the last 2 decades. For patients with lumbar stenosis and degenerative spondylolisthesis, 2 landmark prospective randomized clinical trials (RCTs) published in the New England Journal of Medicine in 2016 did not find clear evidence in favor of decompression with fusion over decompression alone in this population. Objective: To assess the national use of decompression with fusion vs decompression alone for the surgical treatment of lumbar stenosis and degenerative spondylolisthesis from 2016 to 2019. Design, Setting, and Participants: This retrospective cohort study included 121â¯745 hospitalized adult patients (aged ≥18 years) undergoing 1-level decompression alone or decompression with fusion for the management of lumbar stenosis and degenerative spondylolisthesis from January 1, 2016, to December 31, 2019. All data were obtained from the National Inpatient Sample (NIS). Analyses were conducted, reviewed, or updated on June 9, 2023. Main Outcome and Measure: The primary outcome of this study was the use of decompression with fusion vs decompression alone. For the secondary outcome, multivariable logistic regression analysis was used to evaluate factors associated with the decision to perform decompression with fusion vs decompression alone. Results: Among 121â¯745 eligible hospitalized patients (mean age, 65.2 years [95% CI, 65.0-65.4 years]; 96 645 of 117 640 [82.2%] non-Hispanic White) with lumbar stenosis and degenerative spondylolisthesis, 21â¯230 (17.4%) underwent decompression alone, and 100â¯515 (82.6%) underwent decompression with fusion. The proportion of patients undergoing decompression alone decreased from 2016 (7625 of 23 405 [32.6%]) to 2019 (3560 of 37 215 [9.6%]), whereas the proportion of patients undergoing decompression with fusion increased over the same period (from 15 780 of 23 405 [67.4%] in 2016 to 33 655 of 37 215 [90.4%] in 2019). In univariable analysis, patients undergoing decompression alone differed significantly from those undergoing decompression with fusion with regard to age (mean, 68.6 years [95% CI, 68.2-68.9 years] vs 64.5 years [95% CI, 64.3-64.7 years]; P < .001), insurance status (eg, Medicare: 13 725 of 21 205 [64.7%] vs 53 320 of 100 420 [53.1%]; P < .001), All Patient Refined Diagnosis Related Group risk of death (eg, minor risk: 16 900 [79.6%] vs 83 730 [83.3%]; P < .001), and hospital region of the country (eg, South: 7030 [33.1%] vs 38 905 [38.7%]; Midwest: 4470 [21.1%] vs 23 360 [23.2%]; P < .001 for both comparisons). In multivariable logistic regression analysis, older age (adjusted odds ratio [AOR], 0.96 per year; 95% CI, 0.95-0.96 per year), year after 2016 (AOR, 1.76 per year; 95% CI, 1.69-1.85 per year), self-pay insurance status (AOR, 0.59; 95% CI, 0.36-0.95), medium hospital size (AOR, 0.77; 95% CI, 0.67-0.89), large hospital size (AOR, 0.76; 95% CI, 0.67-0.86), and highest median income quartile by patient residence zip code (AOR, 0.79; 95% CI, 0.70-0.89) were associated with lower odds of undergoing decompression with fusion. Conversely, hospital region in the Midwest (AOR, 1.34; 95% CI, 1.14-1.57) or South (AOR, 1.32; 95% CI, 1.14-1.54) was associated with higher odds of undergoing decompression with fusion. Decompression with fusion vs decompression alone was associated with longer length of stay (mean, 2.96 days [95% CI, 2.92-3.01 days] vs 2.55 days [95% CI, 2.49-2.62 days]; P < .001), higher total admission costs (mean, $30â¯288 [95% CI, $29 386-$31 189] vs $16â¯190 [95% CI, $15 189-$17 191]; P < .001), and higher total admission charges (mean, $121â¯892 [95% CI, $119 566-$124 219] vs $82â¯197 [95% CI, $79 745-$84 648]; P < .001). Conclusions and Relevance: In this cohort study, despite 2 prospective RCTs that demonstrated the noninferiority of decompression alone compared with decompression with fusion, use of decompression with fusion relative to decompression alone increased from 2016 to 2019. A variety of patient- and hospital-level factors were associated with surgical procedure choice. These results suggest the findings of 2 major RCTs have not yet produced changes in surgical practice patterns and deserve renewed focus.
Assuntos
Espondilolistese , Adulto , Humanos , Adolescente , Idoso , Constrição Patológica , Pacientes Internados , Grupos Diagnósticos Relacionados , DescompressãoRESUMO
BACKGROUND AND OBJECTIVES: General large language models (LLMs), such as ChatGPT (GPT-3.5), have demonstrated the capability to pass multiple-choice medical board examinations. However, comparative accuracy of different LLMs and LLM performance on assessments of predominantly higher-order management questions is poorly understood. We aimed to assess the performance of 3 LLMs (GPT-3.5, GPT-4, and Google Bard) on a question bank designed specifically for neurosurgery oral boards examination preparation. METHODS: The 149-question Self-Assessment Neurosurgery Examination Indications Examination was used to query LLM accuracy. Questions were inputted in a single best answer, multiple-choice format. χ 2 , Fisher exact, and univariable logistic regression tests assessed differences in performance by question characteristics. RESULTS: On a question bank with predominantly higher-order questions (85.2%), ChatGPT (GPT-3.5) and GPT-4 answered 62.4% (95% CI: 54.1%-70.1%) and 82.6% (95% CI: 75.2%-88.1%) of questions correctly, respectively. By contrast, Bard scored 44.2% (66/149, 95% CI: 36.2%-52.6%). GPT-3.5 and GPT-4 demonstrated significantly higher scores than Bard (both P < .01), and GPT-4 outperformed GPT-3.5 ( P = .023). Among 6 subspecialties, GPT-4 had significantly higher accuracy in the Spine category relative to GPT-3.5 and in 4 categories relative to Bard (all P < .01). Incorporation of higher-order problem solving was associated with lower question accuracy for GPT-3.5 (odds ratio [OR] = 0.80, P = .042) and Bard (OR = 0.76, P = .014), but not GPT-4 (OR = 0.86, P = .085). GPT-4's performance on imaging-related questions surpassed GPT-3.5's (68.6% vs 47.1%, P = .044) and was comparable with Bard's (68.6% vs 66.7%, P = 1.000). However, GPT-4 demonstrated significantly lower rates of "hallucination" on imaging-related questions than both GPT-3.5 (2.3% vs 57.1%, P < .001) and Bard (2.3% vs 27.3%, P = .002). Lack of question text description for questions predicted significantly higher odds of hallucination for GPT-3.5 (OR = 1.45, P = .012) and Bard (OR = 2.09, P < .001). CONCLUSION: On a question bank of predominantly higher-order management case scenarios for neurosurgery oral boards preparation, GPT-4 achieved a score of 82.6%, outperforming ChatGPT and Google Bard.