Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4.

Lahat, Adi; Sharif, Kassem; Zoabi, Narmin; Shneor Patt, Yonatan; Sharif, Yousra; Fisher, Lior; Shani, Uria; Arow, Mohamad; Levin, Roni; Klang, Eyal

Lahat, Adi; Sharif, Kassem; Zoabi, Narmin; Shneor Patt, Yonatan; Sharif, Yousra; Fisher, Lior; Shani, Uria; Arow, Mohamad; Levin, Roni; Klang, Eyal.

Afiliación

Lahat A; Department of Gastroenterology, Chaim Sheba Medical Center, Affiliated with Tel Aviv University, Ramat Gan, Israel.
Sharif K; Department of Gastroenterology, Samson Assuta Ashdod Medical Center, Affiliated with Ben Gurion University of the Negev, Be'er Sheva, Israel.
Zoabi N; Department of Gastroenterology, Chaim Sheba Medical Center, Affiliated with Tel Aviv University, Ramat Gan, Israel.
Shneor Patt Y; Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel.
Sharif Y; Department of Gastroenterology, Chaim Sheba Medical Center, Affiliated with Tel Aviv University, Ramat Gan, Israel.
Fisher L; Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel.
Shani U; Department of Internal Medicine C, Hadassah Medical Center, Jerusalem, Israel.
Arow M; Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel.
Levin R; Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel.
Klang E; Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel.

J Med Internet Res ; 26: e54571, 2024 Jun 27.

Article en En | MEDLINE | ID: mdl-38935937

ABSTRACT

ABSTRACT

BACKGROUND:

Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement.

OBJECTIVE:

This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors' and residents' ratings, and specific question types.

METHODS:

A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications.

RESULTS:

Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5's accuracy, beneficial, and completeness dimensions.

CONCLUSIONS:

ChatGPT's potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.

Asunto(s)

Toma de Decisiones Clínicas; Humanos; Inteligencia Artificial

Palabras clave

AI; ChatGPT; ED physician; EM medicine; ML; NLP; algorithm; algorithms; artificial intelligence; bioethics; chat-GPT; chat-bot; chat-bots; chatbot; chatbots; emergency doctor; emergency medicine; emergency physician; ethical; ethical dilemma; ethical dilemmas; ethics; internal medicine; machine learning; natural language processing; practical model; practical models; predictive analytics; predictive model; predictive models; predictive system

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Toma de Decisiones Clínicas Límite: Humans Idioma: En Revista: J Med Internet Res Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Israel

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google