Evaluating the performance of large language models in haematopoietic stem cell transplantation decision-making.
Br J Haematol
; 204(4): 1523-1528, 2024 Apr.
Article
en En
| MEDLINE
| ID: mdl-38070128
In a first-of-its-kind study, we assessed the capabilities of large language models (LLMs) in making complex decisions in haematopoietic stem cell transplantation. The evaluation was conducted not only for Generative Pre-trained Transformer 4 (GPT-4) but also conducted on other artificial intelligence models: PaLm 2 and Llama-2. Using detailed haematological histories that include both clinical, molecular and donor data, we conducted a triple-blind survey to compare LLMs to haematology residents. We found that residents significantly outperformed LLMs (p = 0.02), particularly in transplant eligibility assessment (p = 0.01). Our triple-blind methodology aimed to mitigate potential biases in evaluating LLMs and revealed both their promise and limitations in deciphering complex haematological clinical scenarios.
Palabras clave
Texto completo:
1
Bases de datos:
MEDLINE
Asunto principal:
Inteligencia Artificial
/
Trasplante de Células Madre Hematopoyéticas
Límite:
Humans
Idioma:
En
Revista:
Br J Haematol
Año:
2024
Tipo del documento:
Article
País de afiliación:
Italia