Your browser doesn't support javascript.
loading
Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists.
Li, Dian-Jeng; Kao, Yu-Chen; Tsai, Shih-Jen; Bai, Ya-Mei; Yeh, Ta-Chuan; Chu, Che-Sheng; Hsu, Chih-Wei; Cheng, Szu-Wei; Hsu, Tien-Wei; Liang, Chih-Sung; Su, Kuan-Pin.
Afiliación
  • Li DJ; Department of Addiction Science, Kaohsiung Municipal Kai-Syuan Psychiatric Hospital, Kaohsiung, Taiwan.
  • Kao YC; Department of Nursing, Meiho University, Pingtung, Taiwan.
  • Tsai SJ; Department of Psychiatry, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan.
  • Bai YM; Department of Psychiatry, Tri-Service General Hospital, Beitou branch, Taipei, Taiwan.
  • Yeh TC; Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan.
  • Chu CS; Department of Psychiatry, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan.
  • Hsu CW; Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan.
  • Cheng SW; Department of Psychiatry, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan.
  • Hsu TW; Institute of Brain Science, National Yang Ming Chiao Tung University, Taipei, Taiwan.
  • Liang CS; Department of Psychiatry, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan.
  • Su KP; Center for Geriatric and Gerontology, Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan.
Psychiatry Clin Neurosci ; 78(6): 347-352, 2024 Jun.
Article en En | MEDLINE | ID: mdl-38404249
ABSTRACT

AIM:

Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well-studied.

METHOD:

In the first step, we compared the performance of ChatGPT GPT-4, Bard, and Llama-2 in the 2022 Taiwan Psychiatric Licensing Examination conducted in traditional Mandarin. In the second step, we compared the scores of these three LLMs with those of 24 experienced psychiatrists in 10 advanced clinical scenario questions designed for psychiatric differential diagnosis.

RESULT:

Only GPT-4 passed the 2022 Taiwan Psychiatric Licensing Examination (scoring 69 and ≥ 60 being considered a passing grade), while Bard scored 36 and Llama-2 scored 25. GPT-4 outperformed Bard and Llama-2, especially in the areas of 'Pathophysiology & Epidemiology' (χ2 = 22.4, P < 0.001) and 'Psychopharmacology & Other therapies' (χ2 = 15.8, P < 0.001). In the differential diagnosis, the mean score of the 24 experienced psychiatrists (mean 6.1, standard deviation 1.9) was higher than that of GPT-4 (5), Bard (3), and Llama-2 (1).

CONCLUSION:

Compared to Bard and Llama-2, GPT-4 demonstrated superior abilities in identifying psychiatric symptoms and making clinical judgments. Besides, GPT-4's ability for differential diagnosis closely approached that of the experienced psychiatrists. GPT-4 revealed a promising potential as a valuable tool in psychiatric practice among the three LLMs.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Psiquiatría Límite: Adult / Humans País/Región como asunto: Asia Idioma: En Revista: Psychiatry Clin Neurosci Asunto de la revista: NEUROLOGIA / PSIQUIATRIA Año: 2024 Tipo del documento: Article País de afiliación: Taiwán

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Psiquiatría Límite: Adult / Humans País/Región como asunto: Asia Idioma: En Revista: Psychiatry Clin Neurosci Asunto de la revista: NEUROLOGIA / PSIQUIATRIA Año: 2024 Tipo del documento: Article País de afiliación: Taiwán