Búsqueda | Portal Regional de la BVS

1.

Can Large Language Models (LLMs) Predict the Appropriate Treatment of Acute Hip Fractures in Older Adults? Comparing Appropriate Use Criteria With Recommendations From ChatGPT.

Nietsch, Katrina S; Shrestha, Nancy; Mazudie Ndjonko, Laura C; Ahmed, Wasil; Mejia, Mateo Restrepo; Zaidat, Bashar; Ren, Renee; Duey, Akiro H; Li, Samuel Q; Kim, Jun S; Hidden, Krystin A; Cho, Samuel K.

J Am Acad Orthop Surg Glob Res Rev ; 8(8)2024 Aug 01.

Artículo en Inglés | MEDLINE | ID: mdl-39137403

RESUMEN

BACKGROUND: Acute hip fractures are a public health problem affecting primarily older adults. Chat Generative Pretrained Transformer may be useful in providing appropriate clinical recommendations for beneficial treatment. OBJECTIVE: To evaluate the accuracy of Chat Generative Pretrained Transformer (ChatGPT)-4.0 by comparing its appropriateness scores for acute hip fractures with the American Academy of Orthopaedic Surgeons (AAOS) Appropriate Use Criteria given 30 patient scenarios. "Appropriateness" indicates the unexpected health benefits of treatment exceed the expected negative consequences by a wide margin. METHODS: Using the AAOS Appropriate Use Criteria as the benchmark, numerical scores from 1 to 9 assessed appropriateness. For each patient scenario, ChatGPT-4.0 was asked to assign an appropriate score for six treatments to manage acute hip fractures. RESULTS: Thirty patient scenarios were evaluated for 180 paired scores. Comparing ChatGPT-4.0 with AAOS scores, there was a positive correlation for multiple cannulated screw fixation, total hip arthroplasty, hemiarthroplasty, and long cephalomedullary nails. Statistically significant differences were observed only between scores for long cephalomedullary nails. CONCLUSION: ChatGPT-4.0 scores were not concordant with AAOS scores, overestimating the appropriateness of total hip arthroplasty, hemiarthroplasty, and long cephalomedullary nails, and underestimating the other three. ChatGPT-4.0 was inadequate in selecting an appropriate treatment deemed acceptable, most reasonable, and most likely to improve patient outcomes.

Asunto(s)

Fracturas de Cadera , Humanos , Fracturas de Cadera/cirugía , Anciano , Femenino , Masculino , Anciano de 80 o más Años , Artroplastia de Reemplazo de Cadera , Hemiartroplastia , Guías de Práctica Clínica como Asunto , Enfermedad Aguda , Lenguaje

2.

Bibliometric Patent Review of Minimally Invasive Spine Surgery.

Zaidat, Bashar; Ahmed, Wasil; Song, Junho; Maza, Noor; Shrestha, Nancy; Rajjoub, Rami; Etigunta, Suhas; Kim, Jun S; Cho, Samuel K.

Clin Spine Surg ; 2024 Aug 02.

Artículo en Inglés | MEDLINE | ID: mdl-39092883

RESUMEN

STUDY DESIGN: This study analyzes patents associated with minimally invasive spine surgery (MISS) found on the Lens open online platform. OBJECTIVE: The goal of this research was to provide an overview of the most referenced patents in the field of MISS and to uncover patterns in the evolution and categorization of these patents. SUMMARY OF BACKGROUND DATA: MISS has rapidly progressed, with a core focus on minimizing surgical damage, preserving the natural anatomy, and enabling swift recovery, all while achieving outcomes that rival traditional open surgery. While prior studies have primarily concentrated on MISS outcomes, the analysis of MISS patents has been limited. METHODS: To conduct this study, we used the Lens platform to search for patents that included the terms "minimally invasive" and "spine" in their titles, abstracts, or claims. We then categorized these patents and identified the top 100 with the most forward citations. We further classified these patents into 4 categories: Spinal Stabilization Systems, Joint Implants or Procedures, Screw Delivery System or Method, and Access and Surgical Pathway Formation. RESULTS: Five hundred two MISS patents were identified initially, and 276 were retained following a screening process. Among the top 100 patents, the majority had active legal status. The largest category within the top 100 patents was Access and Surgical Pathway Formation, closely followed by Spinal Stabilization Systems and Joint Implants or Procedures. The smallest category was Screw Delivery System or Method. Notably, the majority of the top 100 patents had priority years falling between 2000 and 2009, indicating a moderate positive correlation between patent rank and priority year. CONCLUSIONS: Thus far, patents related to Access and Surgical Pathway Formation have laid the foundation for subsequent innovations in Spinal Stabilization Systems and Screw Technology. This study serves as a valuable resource for guiding future innovations in this rapidly evolving field.

3.

Explainable Machine Learning Approach to Prediction of Prolonged Intesive Care Unit Stay in Adult Spinal Deformity Patients: Machine Learning Outperforms Logistic Regression.

Zaidat, Bashar; Kurapatti, Mark; Gal, Jonathan S; Cho, Samuel K; Kim, Jun S.

Global Spine J ; : 21925682241277771, 2024 Aug 21.

Artículo en Inglés | MEDLINE | ID: mdl-39169510

RESUMEN

STUDY DESIGN: Retrospective cohort study. OBJECTIVES: Prolonged ICU stay is a driver of higher costs and inferior outcomes in Adult Spinal Deformity (ASD) patients. Machine learning (ML) models have recently been seen as a viable method of predicting pre-operative risk but are often 'black boxes' that do not fully explain the decision-making process. This study aims to demonstrate ML can achieve similar or greater predictive power as traditional statistical methods and follows traditional clinical decision-making processes. METHODS: Five ML models (Decision Tree, Random Forest, Support Vector Classifier, GradBoost, and a CNN) were trained on data collected from a large urban academic center to predict whether prolonged ICU stay would be required post-operatively. 535 patients who underwent posterior fusion or combined fusion for treatment of ASD were included in each model with a 70-20-10 train-test-validation split. Further analysis was performed using Shapley Additive Explanation (SHAP) values to provide insight into each model's decision-making process. RESULTS: The model's Area Under the Receiver Operating Curve (AUROC) ranged from 0.67 to 0.83. The Random Forest model achieved the highest score. The model considered length of surgery, complications, and estimated blood loss to be the greatest predictors of prolonged ICU stay based on SHAP values. CONCLUSIONS: We developed a ML model that was able to predict whether prolonged ICU stay was required in ASD patients. Further SHAP analysis demonstrated our model aligned with traditional clinical thinking. Thus, ML models have strong potential to assist with risk stratification and more effective and cost-efficient care.

4.

Comparison of biportal endoscopic and microscopic tubular paraspinal approach for foraminal and extraforaminal lumbar disc herniation.

Kang, Min-Seok; Hwang, Jae-Yeun; Park, Sang-Min; Yang, Jae-Hyuk; You, Ki-Han; Hong, Seok-Ho; Cho, Samuel K; Park, Hyun-Jin.

J Neurosurg Spine ; : 1-10, 2024 Jul 19.

Artículo en Inglés | MEDLINE | ID: mdl-39029114

RESUMEN

OBJECTIVE: Foraminal and extraforaminal lumbar disc herniation (FELDH) is an important pathological condition that can lead to lumbar radiculopathy. The paraspinal muscle-splitting approach introduced by Reulen and Wiltse is a reasonable surgical technique. Minimally invasive procedures using a tubular retractor system have also been introduced. However, surgical treatment is considered more challenging for FELDH than for central or subarticular lumbar disc herniations (LDHs). Some researchers have proposed uniportal extraforaminal endoscopic lumbar discectomy through a posterolateral approach as an alternative for FELDH, but heterogeneous clinical results have been reported. Recently, the biportal endoscopic (BE) paraspinal approach has been suggested as an alternative. The aim of this study was to compare the clinical outcomes of BE and microscopic tubular (MT) paraspinal approaches for decompressive foraminotomy and lumbar discectomy (paraLD) in patients with FELDH. METHODS: Ninety-one consecutive patients with unilateral lumbar radiculopathy and FELDH underwent paraLD. Demographic and perioperative data were collected. Clinical outcomes were evaluated using the visual analog scale (VAS) for back and leg pain, the Oswestry Disability Index (ODI) for spinal disability, and the modified Macnab criteria for patient satisfaction. Postoperative complications and reoperation rates were also evaluated. RESULTS: In total, 76 patients were included in the final analysis. Among them, 43 underwent BE paraLD (group A) and the remaining 33 underwent MT paraLD (group B). The demographic and preoperative data were not statistically different between the groups. All patients showed significant improvements in VAS back, VAS leg, and ODI scores compared with baseline values (p < 0.05). The improvement in VAS back scores was significantly better in group A than in group B on postoperative day 2 (p < 0.001). However, all clinical parameters were comparable between the two groups after postoperative year 1 (p > 0.05). According to the modified Macnab criteria, 86.1% and 72.7% of the patients had excellent or good outcomes in groups A and B, respectively. No intergroup differences were observed (p = 0.367). In addition, there were no differences in the total operation time or amount of surgical drainage. Postoperative complications were not significantly different between the two groups (p = 0.301); however, reoperation rates were significantly higher in group B (p = 0.035). CONCLUSIONS: BE paraLD is an effective treatment for FELDH and is an alternative to MT paraLD. In particular, BE paraLD has advantages of early improvement in postoperative back pain and low reoperation rates.

5.

Identification of doping suspicions through artificial intelligence-powered analysis on athlete's performance passport in female weightlifting.

Ryoo, Hyunji; Cho, Samuel; Oh, Taehan; Kim, YuSik; Suh, Sang-Hoon.

Front Physiol ; 15: 1344340, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38938745

RESUMEN

Introduction: Doping remains a persistent concern in sports, compromising fair competition. The Athlete Biological Passport (ABP) has been a standard anti-doping measure, but confounding factors challenge its effectiveness. Our study introduces an artificial intelligence-driven approach for identifying potential doping suspicious, utilizing the Athlete's Performance Passport (APP), which integrates both demographic profiles and performance data, among elite female weightlifters. Methods: Analyzing publicly available performance data in female weightlifting from 1998 to 2020, along with demographic information, encompassing 17,058 entities, we categorized weightlifters by age, body weight (BW) class, and performance levels. Documented anti-doping rule violations (ADRVs) cases were also retained. We employed AI-powered algorithms, including XGBoost, Multilayer Perceptron (MLP), and an Ensemble model, which integrates XGBoost and MLP, to identify doping suspicions based on the dataset we obtained. Results: Our findings suggest a potential doping inclination in female weightlifters in their mid-twenties, and the sanctioned prevalence was the highest in the top 1% performance level and then decreased thereafter. Performance profiles and sanction trends across age groups and BW classes reveal consistently superior performances in sanctioned cases. The Ensemble model showcased impressive predictive performance, achieving a 53.8% prediction rate among the weightlifters sanctioned in the 2008, 2012, and 2016 Olympics. This demonstrated the practical application of the Athlete's Performance Passport (APP) in identifying potential doping suspicions. Discussion: Our study pioneers an AI-driven APP approach in anti-doping, offering a proactive and efficient methodology. The APP, coupled with advanced AI algorithms, holds promise in revolutionizing the efficiency and objectivity of doping tests, providing a novel avenue for enhancing anti-doping measures in elite female weightlifting and potentially extending to diverse sports. We also address the limitation of a constrained set of APPs, advocating for the development of a more accessible and enriched APP system for robust anti-doping practices.

6.

An analysis of ChatGPT recommendations for the diagnosis and treatment of cervical radiculopathy.

Hoang, Timothy; Liou, Lathan; Rosenberg, Ashley M; Zaidat, Bashar; Duey, Akiro H; Shrestha, Nancy; Ahmed, Wasil; Tang, Justin; Kim, Jun S; Cho, Samuel K.

J Neurosurg Spine ; 41(3): 385-395, 2024 Sep 01.

Artículo en Inglés | MEDLINE | ID: mdl-38941643

RESUMEN

OBJECTIVE: The objective of this study was to assess the safety and accuracy of ChatGPT recommendations in comparison to the evidence-based guidelines from the North American Spine Society (NASS) for the diagnosis and treatment of cervical radiculopathy. METHODS: ChatGPT was prompted with questions from the 2011 NASS clinical guidelines for cervical radiculopathy and evaluated for concordance. Selected key phrases within the NASS guidelines were identified. Completeness was measured as the number of overlapping key phrases between ChatGPT responses and NASS guidelines divided by the total number of key phrases. A senior spine surgeon evaluated the ChatGPT responses for safety and accuracy. ChatGPT responses were further evaluated on their readability, similarity, and consistency. Flesch Reading Ease scores and Flesch-Kincaid reading levels were measured to assess readability. The Jaccard Similarity Index was used to assess agreement between ChatGPT responses and NASS clinical guidelines. RESULTS: A total of 100 key phrases were identified across 14 NASS clinical guidelines. The mean completeness of ChatGPT-4 was 46%. ChatGPT-3.5 yielded a completeness of 34%. ChatGPT-4 outperformed ChatGPT-3.5 by a margin of 12%. ChatGPT-4.0 outputs had a mean Flesch reading score of 15.24, which is very difficult to read, requiring a college graduate education to understand. ChatGPT-3.5 outputs had a lower mean Flesch reading score of 8.73, indicating that they are even more difficult to read and require a professional education level to do so. However, both versions of ChatGPT were more accessible than NASS guidelines, which had a mean Flesch reading score of 4.58. Furthermore, with NASS guidelines as a reference, ChatGPT-3.5 registered a mean ± SD Jaccard Similarity Index score of 0.20 ± 0.078 while ChatGPT-4 had a mean of 0.18 ± 0.068. Based on physician evaluation, outputs from ChatGPT-3.5 and ChatGPT-4.0 were safe 100% of the time. Thirteen of 14 (92.8%) ChatGPT-3.5 responses and 14 of 14 (100%) ChatGPT-4.0 responses were in agreement with current best clinical practices for cervical radiculopathy according to a senior spine surgeon. CONCLUSIONS: ChatGPT models were able to provide safe and accurate but incomplete responses to NASS clinical guideline questions about cervical radiculopathy. Although the authors' results suggest that improvements are required before ChatGPT can be reliably deployed in a clinical setting, future versions of the LLM hold promise as an updated reference for guidelines on cervical radiculopathy. Future versions must prioritize accessibility and comprehensibility for a diverse audience.

Asunto(s)

Radiculopatía , Humanos , Radiculopatía/diagnóstico , Guías de Práctica Clínica como Asunto/normas , Vértebras Cervicales/cirugía , Sociedades Médicas

7.

The Effect of Intraoperative Overdistraction on Subsidence Following Anterior Cervical Discectomy and Fusion.

Duey, Akiro H; Gonzalez, Christopher; Hoang, Timothy; Geng, Eric A; Ferriter, Pierce J; Rosenberg, Ashley M; Zaidat, Bashar; Zapolsky, Ivan J; Kim, Jun S; Cho, Samuel K.

Clin Spine Surg ; 2024 Jun 03.

Artículo en Inglés | MEDLINE | ID: mdl-38828954

RESUMEN

STUDY DESIGN: Retrospective cohort. OBJECTIVE: The purpose of this study was to evaluate the effect of overdistraction on interbody cage subsidence. BACKGROUND: Vertebral overdistraction due to the use of large intervertebral cage sizes may increase the risk of postoperative subsidence. METHODS: Patients who underwent anterior cervical discectomy and fusion between 2016 and 2021 were included. All measurements were performed using lateral cervical radiographs at 3 time points - preoperative, immediate postoperative, and final follow-up >6 months postoperatively. Anterior and posterior distraction were calculated by subtracting the preoperative disc height from the immediate postoperative disc height. Cage subsidence was calculated by subtracting the final follow-up postoperative disc height from the immediate postoperative disc height. Associations between anterior and posterior subsidence and distraction were determined using multivariable linear regression models. The analyses controlled for cage type, cervical level, sex, age, smoking status, and osteopenia. RESULTS: Sixty-eight patients and 125 fused levels were included in the study. Of the 68 fusions, 22 were single-level fusions, 35 were 2-level, and 11 were 3-level. The median final follow-up interval was 368 days (range: 181-1257 d). Anterior disc space subsidence was positively associated with anterior distraction (beta = 0.23; 95% CI: 0.08, 0.38; P = 0.004), and posterior disc space subsidence was positively associated with posterior distraction (beta = 0.29; 95% CI: 0.13, 0.45; P < 0.001). No significant associations between anterior distraction and posterior subsidence (beta = 0.07; 95% CI: -0.06, 0.20; P = 0.270) or posterior distraction and anterior subsidence (beta = 0.06; 95% CI: -0.14, 0.27; P = 0.541) were observed. CONCLUSIONS: We found that overdistraction of the disc space was associated with increased postoperative subsidence after anterior cervical discectomy and fusion. Surgeons should consider choosing a smaller cage size to avoid overdistraction and minimize postoperative subsidence.

8.

Answer to the Letter to the Editor of G. Shen, et al. concerning "ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis" by Ahmed W, et al. (Eur Spine J [2024]: doi:10.1007/s00586-024-08198-6).

Ahmed, Wasil; Zaidat, Bashar; Duey, Akiro; Saturno, Michael; Cho, Samuel.

Eur Spine J ; 33(7): 2920, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38695950

Asunto(s)

Espondilolistesis , Humanos , Espondilolistesis/cirugía , Guías de Práctica Clínica como Asunto

9.

Answer to the letter to the editor of A. Kleebayoon, et al. concerning "ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis" by Ahmed W, et al. (Eur Spine J [2024]: doi: 10.1007/s00586-024-08198-6).

Ahmed, Wasil; Zaidat, Bashar; Duey, Akiro; Saturno, Michael; Cho, Samuel.

Eur Spine J ; 33(6): 2537, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38678131

Asunto(s)

Espondilolistesis , Humanos , Espondilolistesis/cirugía , Guías de Práctica Clínica como Asunto

10.

Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery.

Zaidat, Bashar; Shrestha, Nancy; Rosenberg, Ashley M; Ahmed, Wasil; Rajjoub, Rami; Hoang, Timothy; Mejia, Mateo Restrepo; Duey, Akiro H; Tang, Justin E; Kim, Jun S; Cho, Samuel K.

Neurospine ; 21(1): 128-146, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38569639

RESUMEN

OBJECTIVE: Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT's 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines. METHODS: ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy. RESULTS: Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT's GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response. CONCLUSION: ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model's performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model's responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.

11.

Surgeon Preference Regarding Wound Dressing Management in Lumbar Fusion Surgery: An AO Spine Global Cross-Sectional Study.

Ambrosio, Luca; Vadalà, Gianluca; Tavakoli, Javad; Scaramuzzo, Laura; Brodano, Giovanni Barbanti; Lewis, Stephen J; Kato, So; Cho, Samuel K; Yoon, S Tim; Kim, Ho-Joong; Gary, Matthew F; Denaro, Vincenzo.

Neurospine ; 21(1): 204-211, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38569644

RESUMEN

OBJECTIVE: To evaluate the global practice pattern of wound dressing use after lumbar fusion for degenerative conditions. METHODS: A survey issued by AO Spine Knowledge Forums Deformity and Degenerative was sent out to AO Spine members. The type of postoperative dressing employed, timing of initial dressing removal, and type of subsequent dressing applied were investigated. Differences in the type of surgery and regional distribution of surgeons' preferences were analyzed. RESULTS: Right following surgery, 60.6% utilized a dry dressing, 23.2% a plastic occlusive dressing, 5.7% glue, 6% a combination of glue and polyester mesh, 2.6% a wound vacuum, and 1.2% other dressings. The initial dressing was removed on postoperative day 1 (11.6%), 2 (39.2%), 3 (20.3%), 4 (1.7%), 5 (4.3%), 6 (0.4%), 7 or later (12.5%), or depending on drain removal (9.9%). Following initial dressing removal, 75.9% applied a dry dressing, 17.7% a plastic occlusive dressing, and 1.3% glue, while 12.1% used no dressing. The use of no additional coverage after initial dressing removal was significantly associated with a later dressing change (p < 0.001). Significant differences emerged after comparing dressing management among different AO Spine regions (p < 0.001). CONCLUSION: Most spine surgeons utilized a dry or plastic occlusive dressing initially applied after surgery. The first dressing was more frequently changed during the first 3 postoperative days and replaced with the same type of dressing. While dressing policies tended not to vary according to the type of surgery, regional differences suggest that actual practice may be based on personal experience rather than available evidence.

12.

ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis.

Ahmed, Wasil; Saturno, Michael; Rajjoub, Rami; Duey, Akiro H; Zaidat, Bashar; Hoang, Timothy; Restrepo Mejia, Mateo; Gallate, Zachary S; Shrestha, Nancy; Tang, Justin; Zapolsky, Ivan; Kim, Jun S; Cho, Samuel K.

Eur Spine J ; 2024 Mar 15.

Artículo en Inglés | MEDLINE | ID: mdl-38489044

RESUMEN

BACKGROUND CONTEXT: Clinical guidelines, developed in concordance with the literature, are often used to guide surgeons' clinical decision making. Recent advancements of large language models and artificial intelligence (AI) in the medical field come with exciting potential. OpenAI's generative AI model, known as ChatGPT, can quickly synthesize information and generate responses grounded in medical literature, which may prove to be a useful tool in clinical decision-making for spine care. The current literature has yet to investigate the ability of ChatGPT to assist clinical decision making with regard to degenerative spondylolisthesis. PURPOSE: The study aimed to compare ChatGPT's concordance with the recommendations set forth by The North American Spine Society (NASS) Clinical Guideline for the Diagnosis and Treatment of Degenerative Spondylolisthesis and assess ChatGPT's accuracy within the context of the most recent literature. METHODS: ChatGPT-3.5 and 4.0 was prompted with questions from the NASS Clinical Guideline for the Diagnosis and Treatment of Degenerative Spondylolisthesis and graded its recommendations as "concordant" or "nonconcordant" relative to those put forth by NASS. A response was considered "concordant" when ChatGPT generated a recommendation that accurately reproduced all major points made in the NASS recommendation. Any responses with a grading of "nonconcordant" were further stratified into two subcategories: "Insufficient" or "Over-conclusive," to provide further insight into grading rationale. Responses between GPT-3.5 and 4.0 were compared using Chi-squared tests. RESULTS: ChatGPT-3.5 answered 13 of NASS's 28 total clinical questions in concordance with NASS's guidelines (46.4%). Categorical breakdown is as follows: Definitions and Natural History (1/1, 100%), Diagnosis and Imaging (1/4, 25%), Outcome Measures for Medical Intervention and Surgical Treatment (0/1, 0%), Medical and Interventional Treatment (4/6, 66.7%), Surgical Treatment (7/14, 50%), and Value of Spine Care (0/2, 0%). When NASS indicated there was sufficient evidence to offer a clear recommendation, ChatGPT-3.5 generated a concordant response 66.7% of the time (6/9). However, ChatGPT-3.5's concordance dropped to 36.8% when asked clinical questions that NASS did not provide a clear recommendation on (7/19). A further breakdown of ChatGPT-3.5's nonconcordance with the guidelines revealed that a vast majority of its inaccurate recommendations were due to them being "over-conclusive" (12/15, 80%), rather than "insufficient" (3/15, 20%). ChatGPT-4.0 answered 19 (67.9%) of the 28 total questions in concordance with NASS guidelines (P = 0.177). When NASS indicated there was sufficient evidence to offer a clear recommendation, ChatGPT-4.0 generated a concordant response 66.7% of the time (6/9). ChatGPT-4.0's concordance held up at 68.4% when asked clinical questions that NASS did not provide a clear recommendation on (13/19, P = 0.104). CONCLUSIONS: This study sheds light on the duality of LLM applications within clinical settings: one of accuracy and utility in some contexts versus inaccuracy and risk in others. ChatGPT was concordant for most clinical questions NASS offered recommendations for. However, for questions NASS did not offer best practices, ChatGPT generated answers that were either too general or inconsistent with the literature, and even fabricated data/citations. Thus, clinicians should exercise extreme caution when attempting to consult ChatGPT for clinical recommendations, taking care to ensure its reliability within the context of recent literature.

13.

Can generative artificial intelligence pass the orthopaedic board examination?

Isleem, Ula N; Zaidat, Bashar; Ren, Renee; Geng, Eric A; Burapachaisri, Aonnicha; Tang, Justin E; Kim, Jun S; Cho, Samuel K.

J Orthop ; 53: 27-33, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38450060

RESUMEN

Background: Resident training programs in the US use the Orthopaedic In-Training Examination (OITE) developed by the American Academy of Orthopaedic Surgeons (AAOS) to assess the current knowledge of their residents and to identify the residents at risk of failing the Amerian Board of Orthopaedic Surgery (ABOS) examination. Optimal strategies for OITE preparation are constantly being explored. There may be a role for Large Language Models (LLMs) in orthopaedic resident education. ChatGPT, an LLM launched in late 2022 has demonstrated the ability to produce accurate, detailed answers, potentially enabling it to aid in medical education and clinical decision-making. The purpose of this study is to evaluate the performance of ChatGPT on Orthopaedic In-Training Examinations using Self-Assessment Exams from the AAOS database and approved literature as a proxy for the Orthopaedic Board Examination. Methods: 301 SAE questions from the AAOS database and associated AAOS literature were input into ChatGPT's interface in a question and multiple-choice format and the answers were then analyzed to determine which answer choice was selected. A new chat was used for every question. All answers were recorded, categorized, and compared to the answer given by the OITE and SAE exams, noting whether the answer was right or wrong. Results: Of the 301 questions asked, ChatGPT was able to correctly answer 183 (60.8%) of them. The subjects with the highest percentage of correct questions were basic science (81%), oncology (72.7%, shoulder and elbow (71.9%), and sports (71.4%). The questions were further subdivided into 3 groups: those about management, diagnosis, or knowledge recall. There were 86 management questions and 47 were correct (54.7%), 45 diagnosis questions with 32 correct (71.7%), and 168 knowledge recall questions with 102 correct (60.7%). Conclusions: ChatGPT has the potential to provide orthopedic educators and trainees with accurate clinical conclusions for the majority of board-style questions, although its reasoning should be carefully analyzed for accuracy and clinical validity. As such, its usefulness in a clinical educational context is currently limited but rapidly evolving. Clinical relevance: ChatGPT can access a multitude of medical data and may help provide accurate answers to clinical questions.

14.

AO Spine Guideline for the Use of Osteobiologics (AOGO) in Anterior Cervical Discectomy and Fusion for Spinal Degenerative Cases.

Meisel, Hans Jörg; Jain, Amit; Wu, Yabin; Martin, Christopher T; Cabrera, Juan Pablo; Muthu, Sathish; Hamouda, Waeel O; Rodrigues-Pinto, Ricardo; Arts, Jacobus J; Viswanadha, Arun-Kumar; Vadalà, Gianluca; Vergroesen, Pieter-Paul A; Corluka, Stipe; Hsieh, Patrick C; Demetriades, Andreas K; Watanabe, Kota; Shin, John H; Riew, K Daniel; Papavero, Luca; Liu, Gabriel; Luo, Zhuojing; Ahuja, Sashin; Fekete, Tamás; Uz Zaman, Atiq; El-Sharkawi, Mohammad; Sakai, Daisuke; Cho, Samuel K; Wang, Jeffrey C; Yoon, Tim; Santesso, Nancy; Buser, Zorica.

Global Spine J ; 14(2_suppl): 6S-13S, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38421322

RESUMEN

STUDY DESIGN: Guideline. OBJECTIVES: To develop an international guideline (AOGO) about the use of osteobiologics in anterior cervical discectomy and fusion (ACDF) for treating degenerative spine conditions. METHODS: The guideline development process was guided by AO Spine Knowledge Forum Degenerative (KF Degen) and followed the Guideline International Network McMaster Guideline Development Checklist. The process involved 73 participants with expertise in degenerative spine diseases and surgery from 22 countries. Fifteen systematic reviews were conducted addressing respective key topics and evidence was collected. The methodologist compiled the evidence into GRADE Evidence-to-Decision frameworks. Guideline panel members judged the outcomes and other criteria and made the final recommendations through consensus. RESULTS: Five conditional recommendations were created. A conditional recommendation is about the use of allograft, autograft or a cage with an osteobiologic in primary ACDF surgery. Other conditional recommendations are about the use of osteobiologic for single- or multi-level ACDF, and for hybrid construct surgery. It is suggested that surgeons use other osteobiologics rather than human bone morphogenetic protein-2 (BMP-2) in common clinical situations. Surgeons are recommended to choose 1 graft over another or 1 osteobiologic over another primarily based on clinical situation, and the costs and availability of the materials. CONCLUSION: This AOGO guideline is the first to provide recommendations for the use of osteobiologics in ACDF. Despite the comprehensive searches for evidence, there were few studies completed with small sample sizes and primarily as case series with inherent risks of bias. Therefore, high-quality clinical evidence is demanded to improve the guideline.

15.

Reply to Letter-to-the-Editor on ChatGPT for the Diagnosis and Treatment of Low Back Pain: A Comparative Analysis.

Shrestha, Nancy; Cho, Samuel K.

Spine (Phila Pa 1976) ; 49(10): E152, 2024 May 15.

Artículo en Inglés | MEDLINE | ID: mdl-38369782

Asunto(s)

Inteligencia Artificial , Dolor de la Región Lumbar , Humanos , Dolor de la Región Lumbar/terapia , Dolor de la Región Lumbar/diagnóstico

16.

Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison.

Mejia, Mateo Restrepo; Arroyave, Juan Sebastian; Saturno, Michael; Ndjonko, Laura Chelsea Mazudie; Zaidat, Bashar; Rajjoub, Rami; Ahmed, Wasil; Zapolsky, Ivan; Cho, Samuel K.

Neurospine ; 21(1): 149-158, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38291746

RESUMEN

OBJECTIVE: Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients, specifically evaluating how well ChatGPT versions 3.5 and 4 aligned with the 2012 North American Spine Society (NASS) guidelines for lumbar disk herniation with radiculopathy. METHODS: ChatGPT's responses to questions based on the NASS guidelines were analyzed for accuracy. Three new categories-overconclusiveness, supplementary information, and incompleteness-were introduced to deepen the analysis. Overconclusiveness referred to recommendations not mentioned in the NASS guidelines, supplementary information denoted additional relevant details, and incompleteness indicated omitted crucial information from the NASS guidelines. RESULTS: Out of 29 clinical guidelines evaluated, ChatGPT-3.5 demonstrated accuracy in 15 responses (52%), while ChatGPT-4 achieved accuracy in 17 responses (59%). ChatGPT-3.5 was overconclusive in 14 responses (48%), while ChatGPT-4 exhibited overconclusiveness in 13 responses (45%). Additionally, ChatGPT-3.5 provided supplementary information in 24 responses (83%), and ChatGPT-4 provided supplemental information in 27 responses (93%). In terms of incompleteness, ChatGPT-3.5 displayed this in 11 responses (38%), while ChatGPT-4 showed incompleteness in 8 responses (23%). CONCLUSION: ChatGPT shows promise for clinical decision-making, but both patients and healthcare providers should exercise caution to ensure safety and quality of care. While these results are encouraging, further research is necessary to validate the use of large language models in clinical settings.

17.

Evaluating sheep hemoglobins with MD simulations as an animal model for sickle cell disease.

Kuczynski, Caroline E; Porada, Christopher D; Atala, Anthony; Cho, Samuel S; Almeida-Porada, Graça.

Sci Rep ; 14(1): 276, 2024 01 02.

Artículo en Inglés | MEDLINE | ID: mdl-38168584

RESUMEN

Sickle cell disease (SCD) affects millions worldwide, yet there are few therapeutic options. To develop effective treatments, preclinical models that recapitulate human physiology and SCD pathophysiology are needed. SCD arises from a single Glu-to-Val substitution at position 6 in the ß subunit of hemoglobin (Hb), promoting Hb polymerization and subsequent disease. Sheep share important physiological and developmental characteristics with humans, including the same developmental pattern of fetal to adult Hb switching. Herein, we investigated whether introducing the SCD mutation into the sheep ß-globin locus would recapitulate SCD's complex pathophysiology by generating high quality SWISS-MODEL sheep Hb structures and performing MD simulations of normal/sickle human (huHbA/huHbS) and sheep (shHbB/shHbS) Hb, establishing how accurately shHbS mimics huHbS behavior. shHbS, like huHbS, remained stable with low RMSD, while huHbA and shHbB had higher and fluctuating RMSD. shHbB and shHbS also behaved identically to huHbA and huHbS with respect to ß2-Glu6 and ß1-Asp73 (ß1-Asn72 in sheep) solvent interactions. These data demonstrate that introducing the single SCD-causing Glu-to-Val substitution into sheep ß-globin causes alterations consistent with the Hb polymerization that drives RBC sickling, supporting the development of a SCD sheep model to pave the way for alternative cures for this debilitating, globally impactful disease.

Asunto(s)

Anemia de Células Falciformes , Hemoglobinas , Adulto , Humanos , Animales , Ovinos , Hemoglobinas/genética , Anemia de Células Falciformes/terapia , Hemoglobina A , Globinas beta/genética , Modelos Animales , Hemoglobina Falciforme/genética , Hemoglobina Falciforme/química

18.

Performance of ChatGPT on NASS Clinical Guidelines for the Diagnosis and Treatment of Low Back Pain: A Comparison Study.

Shrestha, Nancy; Shen, Zekun; Zaidat, Bashar; Duey, Akiro H; Tang, Justin E; Ahmed, Wasil; Hoang, Timothy; Restrepo Mejia, Mateo; Rajjoub, Rami; Markowitz, Jonathan S; Kim, Jun S; Cho, Samuel K.

Spine (Phila Pa 1976) ; 49(9): 640-651, 2024 May 01.

Artículo en Inglés | MEDLINE | ID: mdl-38213186

RESUMEN

STUDY DESIGN: Comparative analysis. OBJECTIVE: To evaluate Chat Generative Pre-trained Transformer (ChatGPT's) ability to predict appropriate clinical recommendations based on the most recent clinical guidelines for the diagnosis and treatment of low back pain. BACKGROUND: Low back pain is a very common and often debilitating condition that affects many people globally. ChatGPT is an artificial intelligence model that may be able to generate recommendations for low back pain. MATERIALS AND METHODS: Using the North American Spine Society Evidence-Based Clinical Guidelines as the gold standard, 82 clinical questions relating to low back pain were entered into ChatGPT (GPT-3.5) independently. For each question, we recorded ChatGPT's answer, then used a point-answer system-the point being the guideline recommendation and the answer being ChatGPT's response-and asked ChatGPT if the point was mentioned in the answer to assess for accuracy. This response accuracy was repeated with one caveat-a prior prompt is given in ChatGPT to answer as an experienced orthopedic surgeon-for each question by guideline category. A two-sample proportion z test was used to assess any differences between the preprompt and postprompt scenarios with alpha=0.05. RESULTS: ChatGPT's response was accurate 65% (72% postprompt, P =0.41) for guidelines with clinical recommendations, 46% (58% postprompt, P =0.11) for guidelines with insufficient or conflicting data, and 49% (16% postprompt, P =0.003*) for guidelines with no adequate study to address the clinical question. For guidelines with insufficient or conflicting data, 44% (25% postprompt, P =0.01*) of ChatGPT responses wrongly suggested that sufficient evidence existed. CONCLUSION: ChatGPT was able to produce a sufficient clinical guideline recommendation for low back pain, with overall improvements if initially prompted. However, it tended to wrongly suggest evidence and often failed to mention, especially postprompt, when there is not enough evidence to adequately give an accurate recommendation.

Asunto(s)

Dolor de la Región Lumbar , Cirujanos Ortopédicos , Humanos , Dolor de la Región Lumbar/diagnóstico , Dolor de la Región Lumbar/terapia , Inteligencia Artificial , Columna Vertebral

19.

Reliable Prediction of Discharge Disposition Following Cervical Spine Surgery With Ensemble Machine Learning and Validation on a National Cohort.

Feng, Rui; Valliani, Aly A; Martini, Michael L; Gal, Jonathan S; Neifert, Sean N; Kim, Nora C; Geng, Eric A; Kim, Jun S; Cho, Samuel K; Oermann, Eric K; Caridi, John M.

Clin Spine Surg ; 37(1): E30-E36, 2024 02 01.

Artículo en Inglés | MEDLINE | ID: mdl-38285429

RESUMEN

STUDY DESIGN: A retrospective cohort study. OBJECTIVE: The purpose of this study is to develop a machine learning algorithm to predict nonhome discharge after cervical spine surgery that is validated and usable on a national scale to ensure generalizability and elucidate candidate drivers for prediction. SUMMARY OF BACKGROUND DATA: Excessive length of hospital stay can be attributed to delays in postoperative referrals to intermediate care rehabilitation centers or skilled nursing facilities. Accurate preoperative prediction of patients who may require access to these resources can facilitate a more efficient referral and discharge process, thereby reducing hospital and patient costs in addition to minimizing the risk of hospital-acquired complications. METHODS: Electronic medical records were retrospectively reviewed from a single-center data warehouse (SCDW) to identify patients undergoing cervical spine surgeries between 2008 and 2019 for machine learning algorithm development and internal validation. The National Inpatient Sample (NIS) database was queried to identify cervical spine fusion surgeries between 2009 and 2017 for external validation of algorithm performance. Gradient-boosted trees were constructed to predict nonhome discharge across patient cohorts. The area under the receiver operating characteristic curve (AUROC) was used to measure model performance. SHAP values were used to identify nonlinear risk factors for nonhome discharge and to interpret algorithm predictions. RESULTS: A total of 3523 cases of cervical spine fusion surgeries were included from the SCDW data set, and 311,582 cases were isolated from NIS. The model demonstrated robust prediction of nonhome discharge across all cohorts, achieving an area under the receiver operating characteristic curve of 0.87 (SD=0.01) on both the SCDW and nationwide NIS test sets. Anterior approach only, age, elective admission status, Medicare insurance status, and total Elixhauser Comorbidity Index score were the most important predictors of discharge destination. CONCLUSIONS: Machine learning algorithms reliably predict nonhome discharge across single-center and national cohorts and identify preoperative features of importance following cervical spine fusion surgery.

Asunto(s)

Medicare , Alta del Paciente , Estados Unidos , Humanos , Anciano , Estudios Retrospectivos , Aprendizaje Automático , Vértebras Cervicales/cirugía

20.

ChatGPT and its Role in the Decision-Making for the Diagnosis and Treatment of Lumbar Spinal Stenosis: A Comparative Analysis and Narrative Review.

Rajjoub, Rami; Arroyave, Juan Sebastian; Zaidat, Bashar; Ahmed, Wasil; Mejia, Mateo Restrepo; Tang, Justin; Kim, Jun S; Cho, Samuel K.

Global Spine J ; 14(3): 998-1017, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-37560946

RESUMEN

STUDY DESIGN: Comparative Analysis and Narrative Review. OBJECTIVE: To assess and compare ChatGPT's responses to the clinical questions and recommendations proposed by The 2011 North American Spine Society (NASS) Clinical Guideline for the Diagnosis and Treatment of Degenerative Lumbar Spinal Stenosis (LSS). We explore the advantages and disadvantages of ChatGPT's responses through an updated literature review on spinal stenosis. METHODS: We prompted ChatGPT with questions from the NASS Evidence-based Clinical Guidelines for LSS and compared its generated responses with the recommendations provided by the guidelines. A review of the literature was performed via PubMed, OVID, and Cochrane on the diagnosis and treatment of lumbar spinal stenosis between January 2012 and April 2023. RESULTS: 14 questions proposed by the NASS guidelines for LSS were uploaded into ChatGPT and directly compared to the responses offered by NASS. Three questions were on the definition and history of LSS, one on diagnostic tests, seven on non-surgical interventions and three on surgical interventions. The review process found 40 articles that were selected for inclusion that helped corroborate or contradict the responses that were generated by ChatGPT. CONCLUSIONS: ChatGPT's responses were similar to findings in the current literature on LSS. These results demonstrate the potential for implementing ChatGPT into the spine surgeon's workplace as a means of supporting the decision-making process for LSS diagnosis and treatment. However, our narrative summary only provides a limited literature review and additional research is needed to standardize our findings as means of validating ChatGPT's use in the clinical space.

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA