Performance of ChatGPT incorporated chain-of-thought method in bilingual nuclear medicine physician board examinations.

Ting, Yu-Ting; Hsieh, Te-Chun; Wang, Yuh-Feng; Kuo, Yu-Chieh; Chen, Yi-Jin; Chan, Pak-Ki; Kao, Chia-Hung

Ting, Yu-Ting; Hsieh, Te-Chun; Wang, Yuh-Feng; Kuo, Yu-Chieh; Chen, Yi-Jin; Chan, Pak-Ki; Kao, Chia-Hung.

Afiliación

Ting YT; Department of Nuclear Medicine and PET Center, China Medical University Hospital, China Medical University, Taichung.
Hsieh TC; Department of Nuclear Medicine and PET Center, China Medical University Hospital, China Medical University, Taichung.
Wang YF; Department of Biomedical Imaging and Radiological Science, China Medical University, Taichung.
Kuo YC; Department of Nuclear Medicine, Taipei Veterans General Hospital, Taipei.
Chen YJ; Department of Biomedical Imaging and Radiological Sciences, National Yang Ming Chiao Tung University, Taipei.
Chan PK; Department of Medical Imaging and Radiological Technology, Yuanpei University of Medical Technology, Hsinchu.
Kao CH; Artificial Intelligence Center, China Medical University Hospital, China Medical University, Taichung.

Digit Health ; 10: 20552076231224074, 2024.

Article en En | MEDLINE | ID: mdl-38188855

ABSTRACT

ABSTRACT

Objective:

This research explores the performance of ChatGPT, compared to human doctors, in bilingual, Mandarin Chinese and English, medical specialty exam in Nuclear Medicine in Taiwan.

Methods:

The study employed generative pre-trained transformer (GPT-4) and integrated chain-of-thoughts (COT) method to enhance performance by triggering and explaining the thinking process to answer the question in a coherent and logical manner. Questions from the Taiwanese Nuclear Medicine Specialty Exam served as the basis for testing. The research analyzed the correctness of AI responses in different sections of the exam and explored the influence of question length and language proportion on accuracy.

Results:

AI, especially ChatGPT with COT, exhibited exceptional capabilities in theoretical knowledge, clinical medicine, and handling integrated questions, often surpassing, or matching human doctor performance. However, AI struggled with questions related to medical regulations. The analysis of question length showed that questions within the 109-163 words range yielded the highest accuracy. Moreover, an increase in the proportion of English words in questions improved both AI and human accuracy.

Conclusions:

This research highlights the potential and challenges of AI in the medical field. ChatGPT demonstrates significant competence in various aspects of medical knowledge. However, areas like medical regulations require improvement. The study also suggests that AI may help in evaluating exam question difficulty and maintaining fairness in examinations. These findings shed light on AI role in the medical field, with potential applications in healthcare education, exam preparation, and multilingual environments. Ongoing AI advancements are expected to further enhance AI utility in the medical domain.

Palabras clave

ChatGPT; chain-of-thoughts (COT); multilingual environment; nuclear medicine exam

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: Digit Health Año: 2024 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: Digit Health Año: 2024 Tipo del documento: Article