Diagnostic accuracy of vision-language models on Japanese diagnostic radiology, nuclear medicine, and interventional radiology specialty board examinations.

Oura, Tatsushi; Tatekawa, Hiroyuki; Horiuchi, Daisuke; Matsushita, Shu; Takita, Hirotaka; Atsukawa, Natsuko; Mitsuyama, Yasuhito; Yoshida, Atsushi; Murai, Kazuki; Tanaka, Rikako; Shimono, Taro; Yamamoto, Akira; Miki, Yukio; Ueda, Daiju

Oura, Tatsushi; Tatekawa, Hiroyuki; Horiuchi, Daisuke; Matsushita, Shu; Takita, Hirotaka; Atsukawa, Natsuko; Mitsuyama, Yasuhito; Yoshida, Atsushi; Murai, Kazuki; Tanaka, Rikako; Shimono, Taro; Yamamoto, Akira; Miki, Yukio; Ueda, Daiju.

Affiliation

Oura T; Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3, Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan.
Tatekawa H; Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3, Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan. htatekawa@omu.ac.jp.
Horiuchi D; Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3, Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan.
Matsushita S; Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3, Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan.
Takita H; Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3, Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan.
Atsukawa N; Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3, Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan.
Mitsuyama Y; Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3, Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan.
Yoshida A; Department of Nuclear Medicine, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan.
Murai K; Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3, Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan.
Tanaka R; Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3, Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan.
Shimono T; Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3, Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan.
Yamamoto A; Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3, Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan.
Miki Y; Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3, Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan.
Ueda D; Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3, Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan.

Jpn J Radiol ; 2024 Jul 20.

Article in En | MEDLINE | ID: mdl-39031270

ABSTRACT

ABSTRACT

PURPOSE:

The performance of vision-language models (VLMs) with image interpretation capabilities, such as GPT-4 omni (GPT-4o), GPT-4 vision (GPT-4V), and Claude-3, has not been compared and remains unexplored in specialized radiological fields, including nuclear medicine and interventional radiology. This study aimed to evaluate and compare the diagnostic accuracy of various VLMs, including GPT-4 + GPT-4V, GPT-4o, Claude-3 Sonnet, and Claude-3 Opus, using Japanese diagnostic radiology, nuclear medicine, and interventional radiology (JDR, JNM, and JIR, respectively) board certification tests. MATERIALS AND

METHODS:

In total, 383 questions from the JDR test (358 images), 300 from the JNM test (92 images), and 322 from the JIR test (96 images) from 2019 to 2023 were consecutively collected. The accuracy rates of the GPT-4 + GPT-4V, GPT-4o, Claude-3 Sonnet, and Claude-3 Opus were calculated for all questions or questions with images. The accuracy rates of the VLMs were compared using McNemar's test.

RESULTS:

GPT-4o demonstrated the highest accuracy rates across all evaluations with the JDR (all questions, 49%; questions with images, 48%), JNM (all questions, 64%; questions with images, 59%), and JIR tests (all questions, 43%; questions with images, 34%), followed by Claude-3 Opus with the JDR (all questions, 40%; questions with images, 38%), JNM (all questions, 42%; questions with images, 43%), and JIR tests (all questions, 40%; questions with images, 30%). For all questions, McNemar's test showed that GPT-4o significantly outperformed the other VLMs (all P < 0.007), except for Claude-3 Opus in the JIR test. For questions with images, GPT-4o outperformed the other VLMs in the JDR and JNM tests (all P < 0.001), except Claude-3 Opus in the JNM test.

CONCLUSION:

The GPT-4o had the highest success rates for questions with images and all questions from the JDR, JNM, and JIR board certification tests.

Key words

Certification tests; Diagnostic radiology; Interventional radiology; Large language models; Nuclear medicine; Vision-language models

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Jpn J Radiol Journal subject: DIAGNOSTICO POR IMAGEM / RADIOLOGIA / RADIOTERAPIA Year: 2024 Document type: Article Affiliation country: Japan

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google