ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model.

Ebrahimian, Manoochehr; Behnam, Behdad; Ghayebi, Negin; Sobhrakhshankhah, Elham

Ebrahimian, Manoochehr; Behnam, Behdad; Ghayebi, Negin; Sobhrakhshankhah, Elham.

Affiliation

Ebrahimian M; Pediatric Surgery Research Center, Research Institute for Children's Health, Shahid Beheshti University of Medical Sciences, Tehran, Iran manoochehrebrahimian@gmail.com.
Behnam B; Gastrointestinal and Liver Disease Research Center, Iran University of Medical Sciences, Tehran, Iran.
Ghayebi N; School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
Sobhrakhshankhah E; Gastrointestinal and Liver Disease Research Center, Iran University of Medical Sciences, Tehran, Iran.

BMJ Health Care Inform ; 30(1)2023 Dec 11.

Article in En | MEDLINE | ID: mdl-38081765

ABSTRACT

ABSTRACT

INTRODUCTION:

Large language models such as ChatGPT have gained popularity for their ability to generate comprehensive responses to human queries. In the field of medicine, ChatGPT has shown promise in applications ranging from diagnostics to decision-making. However, its performance in medical examinations and its comparison to random guessing have not been extensively studied.

METHODS:

This study aimed to evaluate the performance of ChatGPT in the preinternship examination, a comprehensive medical assessment for students in Iran. The examination consisted of 200 multiple-choice questions categorised into basic science evaluation, diagnosis and decision-making. GPT-4 was used, and the questions were translated to English. A statistical analysis was conducted to assess the performance of ChatGPT and also compare it with a random test group.

RESULTS:

The results showed that ChatGPT performed exceptionally well, with 68.5% of the questions answered correctly, significantly surpassing the pass mark of 45%. It exhibited superior performance in decision-making and successfully passed all specialties. Comparing ChatGPT to the random test group, ChatGPT's performance was significantly higher, demonstrating its ability to provide more accurate responses and reasoning.

CONCLUSION:

This study highlights the potential of ChatGPT in medical licensing examinations and its advantage over random guessing. However, it is important to note that ChatGPT still falls short of human physicians in terms of diagnostic accuracy and decision-making capabilities. Caution should be exercised when using ChatGPT, and its results should be verified by human experts to ensure patient safety and avoid potential errors in the medical field.

Subject(s)

Patient Safety; Physicians; Humans; Iran; Research Design; Artificial Intelligence

Key words

Artificial intelligence; Decision Making, Computer-Assisted; Neural Networks, Computer

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Physicians / Patient Safety Limits: Humans Country/Region as subject: Asia Language: En Journal: BMJ Health Care Inform Year: 2023 Type: Article Affiliation country: Iran

Fulltext

XML

PubMed Links

Search on Google