Can a Machine Ace the Test? Assessing GPT-4.0's Precision in Plastic Surgery Board Examinations.

Al Qurashi, Abdullah A; Albalawi, Ibrahim Abdullah S; Halawani, Ibrahim R; Asaad, Alanoud Hammam; Al Dwehji, Adnan M Osama; Almusa, Hala Abdullah; Alharbi, Ruba Ibrahim; Alobaidi, Hussain Amin; Alarki, Subhi M K Zino; Aljindan, Fahad K

Al Qurashi, Abdullah A; Albalawi, Ibrahim Abdullah S; Halawani, Ibrahim R; Asaad, Alanoud Hammam; Al Dwehji, Adnan M Osama; Almusa, Hala Abdullah; Alharbi, Ruba Ibrahim; Alobaidi, Hussain Amin; Alarki, Subhi M K Zino; Aljindan, Fahad K.

Afiliação

Al Qurashi AA; From the College of Medicine, King Saud bin Abdulaziz University for Health Sciences at the National Guards, Jeddah, Saudi Arabia.
Albalawi IAS; King Abdullah International Medical Research Center, Jeddah, Saudi Arabia.
Halawani IR; Division of Plastic Surgery, Department of Surgery, McGill University, Montreal, Canada.
Asaad AH; Faculty of Medicine, Tabuk University, Tabuk, Saudi Arabia.
Al Dwehji AMO; Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia.
Almusa HA; Alfaisal University, College of Medicine, Riyadh, Saudi Arabia.
Alharbi RI; Alfaisal University, College of Medicine, Riyadh, Saudi Arabia.
Alobaidi HA; Alfaisal University, College of Medicine, Riyadh, Saudi Arabia.
Alarki SMKZ; College of Medicine, Batterjee Medical College, Jeddah, Saudi Arabia.
Aljindan FK; From the College of Medicine, King Saud bin Abdulaziz University for Health Sciences at the National Guards, Jeddah, Saudi Arabia.

Plast Reconstr Surg Glob Open ; 11(12): e5448, 2023 Dec.

Article em En | MEDLINE | ID: mdl-38111723

ABSTRACT

ABSTRACT

Background:

As artificial intelligence makes rapid inroads across various fields, its value in medical education is becoming increasingly evident. This study evaluates the performance of the GPT-4.0 large language model in responding to plastic surgery board examination questions and explores its potential as a learning tool.

Methods:

We used a selection of 50 questions from 19 different chapters of a widely-used plastic surgery reference. Responses generated by the GPT-4.0 model were assessed based on four parameters accuracy, clarity, completeness, and conciseness. Correlation analyses were conducted to ascertain the relationship between these parameters and the overall performance of the model.

Results:

GPT-4.0 showed a strong performance with high mean scores for accuracy (2.88), clarity (3.00), completeness (2.88), and conciseness (2.92) on a three-point scale. Completeness of the model's responses was significantly correlated with accuracy (P < 0.0001), whereas no significant correlation was found between accuracy and clarity or conciseness. Performance variability across different chapters indicates potential limitations of the model in dealing with certain complex topics in plastic surgery.

Conclusions:

The GPT-4.0 model exhibits considerable potential as an auxiliary tool for preparation for plastic surgery board examinations. Despite a few identified limitations, the generally high scores on key parameters suggest the model's ability to provide responses that are accurate, clear, complete, and concise. Future research should focus on enhancing the performance of artificial intelligence models in complex medical topics, further improving their applicability in medical education.

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2023 Tipo de documento: Article