Your browser doesn't support javascript.
loading
Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study.
Nakao, Takahiro; Miki, Soichiro; Nakamura, Yuta; Kikuchi, Tomohiro; Nomura, Yukihiro; Hanaoka, Shouhei; Yoshikawa, Takeharu; Abe, Osamu.
Afiliação
  • Nakao T; Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan.
  • Miki S; Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan.
  • Nakamura Y; Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan.
  • Kikuchi T; Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan.
  • Nomura Y; Department of Radiology, School of Medicine, Jichi Medical University, Shimotsuke, Tochigi, Japan.
  • Hanaoka S; Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan.
  • Yoshikawa T; Center for Frontier Medical Engineering, Chiba University, Inage-ku, Chiba, Japan.
  • Abe O; Department of Radiology, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan.
JMIR Med Educ ; 10: e54393, 2024 Mar 12.
Article em En | MEDLINE | ID: mdl-38470459
ABSTRACT

BACKGROUND:

Previous research applying large language models (LLMs) to medicine was focused on text-based information. Recently, multimodal variants of LLMs acquired the capability of recognizing images.

OBJECTIVE:

We aim to evaluate the image recognition capability of generative pretrained transformer (GPT)-4V, a recent multimodal LLM developed by OpenAI, in the medical field by testing how visual information affects its performance to answer questions in the 117th Japanese National Medical Licensing Examination.

METHODS:

We focused on 108 questions that had 1 or more images as part of a question and presented GPT-4V with the same questions under two conditions (1) with both the question text and associated images and (2) with the question text only. We then compared the difference in accuracy between the 2 conditions using the exact McNemar test.

RESULTS:

Among the 108 questions with images, GPT-4V's accuracy was 68% (73/108) when presented with images and 72% (78/108) when presented without images (P=.36). For the 2 question categories, clinical and general, the accuracies with and those without images were 71% (70/98) versus 78% (76/98; P=.21) and 30% (3/10) versus 20% (2/10; P≥.99), respectively.

CONCLUSIONS:

The additional information from the images did not significantly improve the performance of GPT-4V in the Japanese National Medical Licensing Examination.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Licenciamento / Medicina País/Região como assunto: Asia Idioma: En Revista: JMIR Med Educ Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Japão País de publicação: Canadá

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Licenciamento / Medicina País/Região como assunto: Asia Idioma: En Revista: JMIR Med Educ Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Japão País de publicação: Canadá