Large language models' responses to liver cancer surveillance, diagnosis, and management questions: accuracy, reliability, readability.
Abdom Radiol (NY)
; 49(12): 4286-4294, 2024 Dec.
Article
en En
| MEDLINE
| ID: mdl-39088019
ABSTRACT
PURPOSE:
To assess the accuracy, reliability, and readability of publicly available large language models in answering fundamental questions on hepatocellular carcinoma diagnosis and management.METHODS:
Twenty questions on liver cancer diagnosis and management were asked in triplicate to ChatGPT-3.5 (OpenAI), Gemini (Google), and Bing (Microsoft). Responses were assessed by six fellowship-trained physicians from three academic liver transplant centers who actively diagnose and/or treat liver cancer. Responses were categorized as accurate (score 1; all information is true and relevant), inadequate (score 0; all information is true, but does not fully answer the question or provides irrelevant information), or inaccurate (score - 1; any information is false). Means with standard deviations were recorded. Responses were considered as a whole accurate if mean score was > 0 and reliable if mean score was > 0 across all responses for the single question. Responses were also quantified for readability using the Flesch Reading Ease Score and Flesch-Kincaid Grade Level. Readability and accuracy across 60 responses were compared using one-way ANOVAs with Tukey's multiple comparison tests.RESULTS:
Of the twenty questions, ChatGPT answered nine (45%), Gemini answered 12 (60%), and Bing answered six (30%) questions accurately; however, only six (30%), eight (40%), and three (15%), respectively, were both accurate and reliable. There were no significant differences in accuracy between any chatbot. ChatGPT responses were the least readable (mean Flesch Reading Ease Score 29; college graduate), followed by Gemini (30; college) and Bing (40; college; p < 0.001).CONCLUSION:
Large language models provide complex responses to basic questions on hepatocellular carcinoma diagnosis and management that are seldomly accurate, reliable, or readable.Palabras clave
Texto completo:
1
Base de datos:
MEDLINE
Asunto principal:
Carcinoma Hepatocelular
/
Comprensión
/
Neoplasias Hepáticas
Límite:
Humans
Idioma:
En
Revista:
Abdom Radiol (NY)
Año:
2024
Tipo del documento:
Article
País de afiliación:
Estados Unidos