Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology.

Huang, Yixing; Gomaa, Ahmed; Semrau, Sabine; Haderlein, Marlen; Lettmaier, Sebastian; Weissmann, Thomas; Grigo, Johanna; Tkhayat, Hassen Ben; Frey, Benjamin; Gaipl, Udo; Distel, Luitpold; Maier, Andreas; Fietkau, Rainer; Bert, Christoph; Putz, Florian

Huang, Yixing; Gomaa, Ahmed; Semrau, Sabine; Haderlein, Marlen; Lettmaier, Sebastian; Weissmann, Thomas; Grigo, Johanna; Tkhayat, Hassen Ben; Frey, Benjamin; Gaipl, Udo; Distel, Luitpold; Maier, Andreas; Fietkau, Rainer; Bert, Christoph; Putz, Florian.

Afiliación

Huang Y; Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
Gomaa A; Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany.
Semrau S; Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
Haderlein M; Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany.
Lettmaier S; Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
Weissmann T; Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany.
Grigo J; Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
Tkhayat HB; Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany.
Frey B; Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
Gaipl U; Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany.
Distel L; Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
Maier A; Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany.
Fietkau R; Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
Bert C; Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany.
Putz F; Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.

Front Oncol ; 13: 1265024, 2023.

Article en En | MEDLINE | ID: mdl-37790756

RESUMEN

Purpose: The potential of large language models in medicine for education and decision-making purposes has been demonstrated as they have achieved decent scores on medical exams such as the United States Medical Licensing Exam (USMLE) and the MedQA exam. This work aims to evaluate the performance of ChatGPT-4 in the specialized field of radiation oncology. Methods: The 38th American College of Radiology (ACR) radiation oncology in-training (TXIT) exam and the 2022 Red Journal Gray Zone cases are used to benchmark the performance of ChatGPT-4. The TXIT exam contains 300 questions covering various topics of radiation oncology. The 2022 Gray Zone collection contains 15 complex clinical cases. Results: For the TXIT exam, ChatGPT-3.5 and ChatGPT-4 have achieved the scores of 62.05% and 78.77%, respectively, highlighting the advantage of the latest ChatGPT-4 model. Based on the TXIT exam, ChatGPT-4's strong and weak areas in radiation oncology are identified to some extent. Specifically, ChatGPT-4 demonstrates better knowledge of statistics, CNS & eye, pediatrics, biology, and physics than knowledge of bone & soft tissue and gynecology, as per the ACR knowledge domain. Regarding clinical care paths, ChatGPT-4 performs better in diagnosis, prognosis, and toxicity than brachytherapy and dosimetry. It lacks proficiency in in-depth details of clinical trials. For the Gray Zone cases, ChatGPT-4 is able to suggest a personalized treatment approach to each case with high correctness and comprehensiveness. Importantly, it provides novel treatment aspects for many cases, which are not suggested by any human experts. Conclusion: Both evaluations demonstrate the potential of ChatGPT-4 in medical education for the general public and cancer patients, as well as the potential to aid clinical decision-making, while acknowledging its limitations in certain domains. Owing to the risk of hallucinations, it is essential to verify the content generated by models such as ChatGPT for accuracy.

Palabras clave

Gray Zone; artificial intelligence; clinical decision support (CDS); large language model; natural language processing; radiotherapy

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Guideline / Prognostic_studies Idioma: En Revista: Front Oncol Año: 2023 Tipo del documento: Article País de afiliación: Alemania

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google