Your browser doesn't support javascript.
loading
Examining the Threat of ChatGPT to the Validity of Short Answer Assessments in an Undergraduate Medical Program.
Morjaria, Leo; Burns, Levi; Bracken, Keyna; Ngo, Quang N; Lee, Mark; Levinson, Anthony J; Smith, John; Thompson, Penelope; Sibbald, Matthew.
Afiliación
  • Morjaria L; Michael G. DeGroote School of Medicine, McMaster University, Hamilton, Ontario, Canada.
  • Burns L; Michael G. DeGroote School of Medicine, McMaster University, Hamilton, Ontario, Canada.
  • Bracken K; Michael G. DeGroote School of Medicine, McMaster University, Hamilton, Ontario, Canada.
  • Ngo QN; McMaster Education Research, Innovation and Theory (MERIT) Program, McMaster University, Hamilton, Ontario, Canada.
  • Lee M; Michael G. DeGroote School of Medicine, McMaster University, Hamilton, Ontario, Canada.
  • Levinson AJ; McMaster Education Research, Innovation and Theory (MERIT) Program, McMaster University, Hamilton, Ontario, Canada.
  • Smith J; McMaster Education Research, Innovation and Theory (MERIT) Program, McMaster University, Hamilton, Ontario, Canada.
  • Thompson P; Michael G. DeGroote School of Medicine, McMaster University, Hamilton, Ontario, Canada.
  • Sibbald M; Michael G. DeGroote School of Medicine, McMaster University, Hamilton, Ontario, Canada.
J Med Educ Curric Dev ; 10: 23821205231204178, 2023.
Article en En | MEDLINE | ID: mdl-37780034
ABSTRACT

OBJECTIVES:

ChatGPT is an artificial intelligence model that can interpret free-text prompts and return detailed, human-like responses across a wide domain of subjects. This study evaluated the extent of the threat posed by ChatGPT to the validity of short-answer assessment problems used to examine pre-clerkship medical students in our undergraduate medical education program.

METHODS:

Forty problems used in prior student assessments were retrieved and stratified by levels of Bloom's Taxonomy. Thirty of these problems were submitted to ChatGPT-3.5. For the remaining 10 problems, we retrieved past minimally passing student responses. Six tutors graded each of the 40 responses. Comparison of performance between student-generated and ChatGPT-generated answers aggregated as a whole and grouped by Bloom's levels of cognitive reasoning, was done using t-tests, ANOVA, Cronbach's alpha, and Cohen's d. Scores for ChatGPT-generated responses were also compared to historical class average performance.

RESULTS:

ChatGPT-generated responses received a mean score of 3.29 out of 5 (n = 30, 95% CI 2.93-3.65) compared to 2.38 for a group of students meeting minimum passing marks (n = 10, 95% CI 1.94-2.82), representing higher performance (P = .008, η2 = 0.169), but was outperformed by historical class average scores on the same 30 problems (mean 3.67, P = .018) when including all past responses regardless of student performance level. There was no statistically significant trend in performance across domains of Bloom's Taxonomy.

CONCLUSION:

While ChatGPT was able to pass short answer assessment problems spanning the pre-clerkship curriculum, it outperformed only underperforming students. We remark that tutors in several cases were convinced that ChatGPT-produced responses were produced by students. Risks to assessment validity include uncertainty in identifying struggling students and inability to intervene in a timely manner. The performance of ChatGPT on problems requiring increasing demands of cognitive reasoning warrants further research.
Palabras clave

Texto completo: 1 Base de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Revista: J Med Educ Curric Dev Año: 2023 Tipo del documento: Article

Texto completo: 1 Base de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Revista: J Med Educ Curric Dev Año: 2023 Tipo del documento: Article