Your browser doesn't support javascript.
loading
Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study.
Iannantuono, Giovanni Maria; Bracken-Clarke, Dara; Karzai, Fatima; Choo-Wosoba, Hyoyoung; Gulley, James L; Floudas, Charalampos S.
Afiliación
  • Iannantuono GM; Genitourinary Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
  • Bracken-Clarke D; Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
  • Karzai F; Genitourinary Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
  • Choo-Wosoba H; Biostatistics and Data Management Section, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
  • Gulley JL; Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
  • Floudas CS; Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
Oncologist ; 29(5): 407-414, 2024 May 03.
Article en En | MEDLINE | ID: mdl-38309720
ABSTRACT

BACKGROUND:

The capability of large language models (LLMs) to understand and generate human-readable text has prompted the investigation of their potential as educational and management tools for patients with cancer and healthcare providers. MATERIALS AND

METHODS:

We conducted a cross-sectional study aimed at evaluating the ability of ChatGPT-4, ChatGPT-3.5, and Google Bard to answer questions related to 4 domains of immuno-oncology (Mechanisms, Indications, Toxicities, and Prognosis). We generated 60 open-ended questions (15 for each section). Questions were manually submitted to LLMs, and responses were collected on June 30, 2023. Two reviewers evaluated the answers independently.

RESULTS:

ChatGPT-4 and ChatGPT-3.5 answered all questions, whereas Google Bard answered only 53.3% (P < .0001). The number of questions with reproducible answers was higher for ChatGPT-4 (95%) and ChatGPT3.5 (88.3%) than for Google Bard (50%) (P < .0001). In terms of accuracy, the number of answers deemed fully correct were 75.4%, 58.5%, and 43.8% for ChatGPT-4, ChatGPT-3.5, and Google Bard, respectively (P = .03). Furthermore, the number of responses deemed highly relevant was 71.9%, 77.4%, and 43.8% for ChatGPT-4, ChatGPT-3.5, and Google Bard, respectively (P = .04). Regarding readability, the number of highly readable was higher for ChatGPT-4 and ChatGPT-3.5 (98.1%) and (100%) compared to Google Bard (87.5%) (P = .02).

CONCLUSION:

ChatGPT-4 and ChatGPT-3.5 are potentially powerful tools in immuno-oncology, whereas Google Bard demonstrated relatively poorer performance. However, the risk of inaccuracy or incompleteness in the responses was evident in all 3 LLMs, highlighting the importance of expert-driven verification of the outputs returned by these technologies.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Neoplasias Tipo de estudio: Observational_studies / Prevalence_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: Oncologist Asunto de la revista: NEOPLASIAS Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Neoplasias Tipo de estudio: Observational_studies / Prevalence_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: Oncologist Asunto de la revista: NEOPLASIAS Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos