A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports.

Truhn, Daniel; Weber, Christian D; Braun, Benedikt J; Bressem, Keno; Kather, Jakob N; Kuhl, Christiane; Nebelung, Sven

Truhn, Daniel; Weber, Christian D; Braun, Benedikt J; Bressem, Keno; Kather, Jakob N; Kuhl, Christiane; Nebelung, Sven.

Afiliación

Truhn D; Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Pauwels Street 30, 52074, Aachen, Germany.
Weber CD; Department of Orthopaedics and Trauma Surgery, University Hospital RWTH Aachen, Aachen, Germany.
Braun BJ; University Hospital Tuebingen on Behalf of the Eberhard-Karls-University Tuebingen, BG Hospital, Schnarrenbergstr. 95, Tübingen, Germany.
Bressem K; Department of Radiology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Hindenburgdamm 30, 12203, Berlin, Germany.
Kather JN; Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany.
Kuhl C; Department of Medicine I, University Hospital Dresden, Dresden, Germany.
Nebelung S; Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.

Sci Rep ; 13(1): 20159, 2023 11 17.

Article en En | MEDLINE | ID: mdl-37978240

ABSTRACT

ABSTRACT

Large language models (LLMs) have shown potential in various applications, including clinical practice. However, their accuracy and utility in providing treatment recommendations for orthopedic conditions remain to be investigated. Thus, this pilot study aims to evaluate the validity of treatment recommendations generated by GPT-4 for common knee and shoulder orthopedic conditions using anonymized clinical MRI reports. A retrospective analysis was conducted using 20 anonymized clinical MRI reports, with varying severity and complexity. Treatment recommendations were elicited from GPT-4 and evaluated by two board-certified specialty-trained senior orthopedic surgeons. Their evaluation focused on semiquantitative gradings of accuracy and clinical utility and potential limitations of the LLM-generated recommendations. GPT-4 provided treatment recommendations for 20 patients (mean age, 50 years ± 19 [standard deviation]; 12 men) with acute and chronic knee and shoulder conditions. The LLM produced largely accurate and clinically useful recommendations. However, limited awareness of a patient's overall situation, a tendency to incorrectly appreciate treatment urgency, and largely schematic and unspecific treatment recommendations were observed and may reduce its clinical usefulness. In conclusion, LLM-based treatment recommendations are largely adequate and not prone to 'hallucinations', yet inadequate in particular situations. Critical guidance by healthcare professionals is obligatory, and independent use by patients is discouraged, given the dependency on precise data input.

Asunto(s)

Medicina; Enfermedades Musculoesqueléticas; Masculino; Humanos; Persona de Mediana Edad; Proyectos Piloto; Estudios Retrospectivos; Lenguaje; Imagen por Resonancia Magnética

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Enfermedades Musculoesqueléticas / Medicina Límite: Humans / Male / Middle aged Idioma: En Revista: Sci Rep Año: 2023 Tipo del documento: Article País de afiliación: Alemania

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google