Appropriateness and Reliability of an Online Artificial Intelligence Platform's Responses to Common Questions Regarding Distal Radius Fractures.

Christy, Michele; Morris, Marie T; Goldfarb, Charles A; Dy, Christopher J

Christy, Michele; Morris, Marie T; Goldfarb, Charles A; Dy, Christopher J.

Afiliación

Christy M; Department of Orthopaedic Surgery, Washington University in St. Louis, St. Louis, MO.
Morris MT; Department of Orthopaedic Surgery, Washington University in St. Louis, St. Louis, MO.
Goldfarb CA; Department of Orthopaedic Surgery, Washington University in St. Louis, St. Louis, MO.
Dy CJ; Department of Orthopaedic Surgery, Washington University in St. Louis, St. Louis, MO. Electronic address: Dyc@wustl.edu.

J Hand Surg Am ; 49(2): 91-98, 2024 Feb.

Article en En | MEDLINE | ID: mdl-38069953

ABSTRACT

ABSTRACT

PURPOSE:

Chat Generative Pre-Trained Transformer (ChatGPT) is a novel artificial intelligence chatbot that is changing the way humans gather information online. The purpose of this study was to investigate ChatGPT's ability to appropriately and reliably answer common questions regarding distal radius fractures.

METHODS:

Thirty common questions regarding distal radius fractures were presented in an identical manner to the online ChatGPT-3.5 interface three separate times, yielding 90 unique responses because ChatGPT produces an original answer with each query. All responses were graded as "appropriate," "appropriate but incomplete," or "inappropriate" by a consensus discussion among three hand surgeon reviewers. The questions were additionally subcategorized into one of four domains based on Bloom's cognitive learning taxonomy, and descriptive statistics were reported.

RESULTS:

Seventy of the 90 total responses (78%) produced by ChatGPT were "appropriate," and 29 of the 30 questions (97%) had at least one response considered appropriate (of the three possible). However, only 17 of the 30 questions (57%) were answered appropriately on all three iterations. The test-retest reliability of ChatGPT was poor with an intraclass correlation coefficient of 0.12. Finally, ChatGPT performed best answering questions requiring lower-order thinking skills (Bloom's levels 1-3) and less well on level 4 questions.

CONCLUSIONS:

This study found that although ChatGPT has the capability to answer common questions regarding distal radius fractures, caution should be taken before implementing its use, given ChatGPT's inconsistency in providing a complete and accurate response to the same question every time. CLINICAL RELEVANCE As the popularity and technology of ChatGPT continue to grow, it is important to understand the potential and limitations of this platform to determine how it may be best implemented to improve patient care.

Asunto(s)

Cirujanos; Fracturas de la Muñeca; Humanos; Inteligencia Artificial; Reproducibilidad de los Resultados; Programas Informáticos

Palabras clave

Artificial intelligence; ChatGPT; distal radius fractures; patient education

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Cirujanos / Fracturas de la Muñeca Límite: Humans Idioma: En Revista: J Hand Surg Am Año: 2024 Tipo del documento: Article País de afiliación: Macao

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google