Your browser doesn't support javascript.
loading
ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam.
Kiyak, Yavuz Selim; Coskun, Özlem; Budakoglu, Isil Irem; Uluoglu, Canan.
Affiliation
  • Kiyak YS; Department of Medical Education and Informatics, Faculty of Medicine, Gazi University, Ankara, Turkey. yskiyak@gazi.edu.tr.
  • Coskun Ö; Gazi Üniversitesi Hastanesi E Blok 9, Kat 06500 Besevler, Ankara, Turkey. yskiyak@gazi.edu.tr.
  • Budakoglu II; Department of Medical Education and Informatics, Faculty of Medicine, Gazi University, Ankara, Turkey.
  • Uluoglu C; Department of Medical Education and Informatics, Faculty of Medicine, Gazi University, Ankara, Turkey.
Eur J Clin Pharmacol ; 80(5): 729-735, 2024 May.
Article in En | MEDLINE | ID: mdl-38353690
ABSTRACT

PURPOSE:

Artificial intelligence, specifically large language models such as ChatGPT, offers valuable potential benefits in question (item) writing. This study aimed to determine the feasibility of generating case-based multiple-choice questions using ChatGPT in terms of item difficulty and discrimination levels.

METHODS:

This study involved 99 fourth-year medical students who participated in a rational pharmacotherapy clerkship carried out based-on the WHO 6-Step Model. In response to a prompt that we provided, ChatGPT generated ten case-based multiple-choice questions on hypertension. Following an expert panel, two of these multiple-choice questions were incorporated into a medical school exam without making any changes in the questions. Based on the administration of the test, we evaluated their psychometric properties, including item difficulty, item discrimination (point-biserial correlation), and functionality of the options.

RESULTS:

Both questions exhibited acceptable levels of point-biserial correlation, which is higher than the threshold of 0.30 (0.41 and 0.39). However, one question had three non-functional options (options chosen by fewer than 5% of the exam participants) while the other question had none.

CONCLUSIONS:

The findings showed that the questions can effectively differentiate between students who perform at high and low levels, which also point out the potential of ChatGPT as an artificial intelligence tool in test development. Future studies may use the prompt to generate items in order for enhancing the external validity of the results by gathering data from diverse institutions and settings.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Students, Medical / Hypertension Type of study: Prognostic_studies Limits: Humans Language: En Journal: Eur J Clin Pharmacol / Eur. j. clin. pharmacol / European journal of clinical pharmacology Year: 2024 Document type: Article

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Students, Medical / Hypertension Type of study: Prognostic_studies Limits: Humans Language: En Journal: Eur J Clin Pharmacol / Eur. j. clin. pharmacol / European journal of clinical pharmacology Year: 2024 Document type: Article