Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine.

Kaneda, Yudai; Tayuinosho, Akari; Tomoyose, Rika; Takita, Morihito; Hamaki, Tamae; Tanimoto, Tetsuya; Ozaki, Akihiko

Kaneda, Yudai; Tayuinosho, Akari; Tomoyose, Rika; Takita, Morihito; Hamaki, Tamae; Tanimoto, Tetsuya; Ozaki, Akihiko.

Afiliação

Kaneda Y; School of Medicine, Hokkaido University, Hokkaido, Japan.
Tayuinosho A; School of Medicine, Nagoya University, Aichi, Japan.
Tomoyose R; School of Medicine, Hokkaido University, Hokkaido, Japan.
Takita M; Department of Internal Medicine, Accessible Rail Medical Services Tetsuikai, Navitas Clinic Tachikawa, Tachikawa, Japan.
Hamaki T; Department of Internal Medicine, Accessible Rail Medical Services Tetsuikai, Navitas Clinic Shinjuku, Tokyo, Japan.
Tanimoto T; Internal Medicine, Accessible Rail Medical Services Tetsuikai, Navitas Clinic, Kawasaki, Kanagawa, Japan.
Ozaki A; Department of Breast Surgery, Jyoban Hospital of Tokiwa Foundation, Iwaki, Fukushima, Japan.

J Eval Clin Pract ; 30(6): 1017-1023, 2024 Sep.

Article em En | MEDLINE | ID: mdl-38764369

ABSTRACT

ABSTRACT

INTRODUCTION:

ChatGPT, a large-scale language model, is a notable example of AI's potential in health care. However, its effectiveness in clinical settings, especially when compared to human physicians, is not fully understood. This study evaluates ChatGPT's capabilities and limitations in answering questions for Japanese internal medicine specialists, aiming to clarify its accuracy and tendencies in both correct and incorrect responses.

METHODS:

We utilized ChatGPT's answers on four sets of self-training questions for internal medicine specialists in Japan from 2020 to 2023. We ran three trials for each set to evaluate its overall accuracy and performance on nonimage questions. Subsequently, we categorized the questions into two groups those ChatGPT consistently answered correctly (Confirmed Correct Answer, CCA) and those it consistently answered incorrectly (Confirmed Incorrect Answer, CIA). For these groups, we calculated the average accuracy rates and 95% confidence intervals based on the actual performance of internal medicine physicians on each question and analyzed the statistical significance between the two groups. This process was then similarly applied to the subset of nonimage CCA and CIA questions.

RESULTS:

ChatGPT's overall accuracy rate was 59.05%, increasing to 65.76% for nonimage questions. 24.87% of the questions had answers that varied between correct and incorrect in the three trials. Despite surpassing the passing threshold for nonimage questions, ChatGPT's accuracy was lower than that of human specialists. There was a significant variance in accuracy between CCA and CIA groups, with ChatGPT mirroring human physician patterns in responding to different question types.

CONCLUSION:

This study underscores ChatGPT's potential utility and limitations in internal medicine. While effective in some aspects, its dependence on question type and context suggests that it should supplement, not replace, professional medical judgment. Further research is needed to integrate Artificial Intelligence tools like ChatGPT more effectively into specialized medical practices.

Assuntos

Inteligência Artificial; Medicina Interna; Competência Clínica/normas; Medicina Interna/educação; Japão

Palavras-chave

ChatGPT; Japan; internal medicine

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Inteligência Artificial / Medicina Interna País como assunto: Asia Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google