Can large language models reason about medical questions?

Liévin, Valentin; Hother, Christoffer Egeberg; Motzfeldt, Andreas Geert; Winther, Ole

Liévin, Valentin; Hother, Christoffer Egeberg; Motzfeldt, Andreas Geert; Winther, Ole.

Afiliación

Liévin V; Section for Cognitive Systems, Technical University of Denmark, Anker Engelunds Vej 101, 2800 Kongens Lyngby, Denmark.
Hother CE; FindZebra, Rådvadsvej 36, 2400 Copenhagen, Denmark.
Motzfeldt AG; Department of Clinical Immunology, Copenhagen University Hospital, Rigshospitalet, Inge Lehmanns Vej 107, 2100 Copenhagen, Denmark.
Winther O; Section for Cognitive Systems, Technical University of Denmark, Anker Engelunds Vej 101, 2800 Kongens Lyngby, Denmark.

Patterns (N Y) ; 5(3): 100943, 2024 Mar 08.

Article en En | MEDLINE | ID: mdl-38487804

ABSTRACT

ABSTRACT

Although large language models often produce impressive outputs, it remains unclear how they perform in real-world scenarios requiring strong reasoning skills and expert domain knowledge. We set out to investigate whether closed- and open-source models (GPT-3.5, Llama 2, etc.) can be applied to answer and reason about difficult real-world-based questions. We focus on three popular medical benchmarks (MedQA-US Medical Licensing Examination [USMLE], MedMCQA, and PubMedQA) and multiple prompting scenarios chain of thought (CoT; think step by step), few shot, and retrieval augmentation. Based on an expert annotation of the generated CoTs, we found that InstructGPT can often read, reason, and recall expert knowledge. Last, by leveraging advances in prompt engineering (few-shot and ensemble methods), we demonstrated that GPT-3.5 not only yields calibrated predictive distributions but also reaches the passing score on three datasets MedQA-USMLE (60.2%), MedMCQA (62.7%), and PubMedQA (78.2%). Open-source models are closing the gap Llama 2 70B also passed the MedQA-USMLE with 62.5% accuracy.

Palabras clave

GPT-3.5; Llama 2; MedQA; large language models; machine learning; medical; open source; prompt engineering; question answering; uncertainty quantification

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Patterns (N Y) Año: 2024 Tipo del documento: Article País de afiliación: Dinamarca Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google