Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal.

Noda, Ryunosuke; Izaki, Yuto; Kitano, Fumiya; Komatsu, Jun; Ichikawa, Daisuke; Shibagaki, Yugo

Noda, Ryunosuke; Izaki, Yuto; Kitano, Fumiya; Komatsu, Jun; Ichikawa, Daisuke; Shibagaki, Yugo.

Afiliación

Noda R; Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan. nodaryu00@gmail.com.
Izaki Y; Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan.
Kitano F; Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan.
Komatsu J; Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan.
Ichikawa D; Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan.
Shibagaki Y; Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan.

Clin Exp Nephrol ; 28(5): 465-469, 2024 May.

Article en En | MEDLINE | ID: mdl-38353783

ABSTRACT

ABSTRACT

BACKGROUND:

Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications.

METHODS:

Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents.

RESULTS:

The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4's performance was between third- and fourth-year nephrology residents.

CONCLUSIONS:

GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs' potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.

Asunto(s)

Nefrología; Autoevaluación (Psicología); Humanos; Evaluación Educacional; Consejos de Especialidades; Competencia Clínica; Inteligencia Artificial

Palabras clave

Artificial intelligence; ChatGPT; GPT-4; Large language models; Nephrology

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Autoevaluación (Psicología) / Nefrología Tipo de estudio: Guideline / Prognostic_studies Límite: Humans Idioma: En Revista: Clin Exp Nephrol Asunto de la revista: NEFROLOGIA Año: 2024 Tipo del documento: Article País de afiliación: Japón

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google