Two artificial intelligence models underperform on examinations in a veterinary curriculum.

Coleman, Michelle C; Moore, James N

Coleman, Michelle C; Moore, James N.

J Am Vet Med Assoc ; 262(5): 692-697, 2024 May 01.

Article in En | MEDLINE | ID: mdl-38382193

ABSTRACT

ABSTRACT

OBJECTIVE:

Advancements in artificial intelligence (AI) and large language models have rapidly generated new possibilities for education and knowledge dissemination in various domains. Currently, our understanding of the knowledge of these models, such as ChatGPT, in the medical and veterinary sciences is in its nascent stage. Educators are faced with an urgent need to better understand these models in order to unleash student potential, promote responsible use, and align AI models with educational goals and learning objectives. The objectives of this study were to evaluate the knowledge level and consistency of responses of 2 platforms of ChatGPT, namely GPT-3.5 and GPT-4.0. SAMPLE A total of 495 multiple-choice and true/false questions from 15 courses used in the assessment of third-year veterinary students at a single veterinary institution were included in this study.

METHODS:

The questions were manually entered 3 times into each platform, and answers were recorded. These answers were then compared against those provided by the faculty members coordinating the courses.

RESULTS:

GPT-3.5 achieved an overall performance score of 55%, whereas GPT-4.0 had a significantly (P < .05) greater performance score of 77%. Importantly, the performance scores of both platforms were significantly (P < .05) below that of the veterinary students (86%). CLINICAL RELEVANCE Findings of this study suggested that veterinary educators and veterinary students retrieving information from these AI-based platforms should do so with caution.

Key words

ChatGPT; artificial intelligence; curriculum; education; large language models

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google