Large Language Model-Based Evaluation of Medical Question Answering Systems: Algorithm Development and Case Study.

Reichenpfader, Daniel; Rösslhuemer, Philipp; Denecke, Kerstin

Reichenpfader, Daniel; Rösslhuemer, Philipp; Denecke, Kerstin.

Afiliação

Reichenpfader D; Bern University of Applied Sciences, Biel/Bienne, Switzerland.
Rösslhuemer P; Department of Diagnostic, Interventional and Pediatric Radiology, Bern University Hospital, University of Bern, Bern, Switzerland.
Denecke K; Bern University of Applied Sciences, Biel/Bienne, Switzerland.

Stud Health Technol Inform ; 313: 22-27, 2024 Apr 26.

Article em En | MEDLINE | ID: mdl-38682499

ABSTRACT

ABSTRACT

BACKGROUND:

Healthcare systems are increasingly resource constrained, leaving less time for important patient-provider interactions. Conversational agents (CAs) could be used to support the provision of information and to answer patients' questions. However, information must be accessible to a variety of patient populations, which requires understanding questions expressed at different language levels.

METHODS:

This study describes the use of Large Language Models (LLMs) to evaluate predefined medical content in CAs across patient populations. These simulated populations are characterized by a range of health literacy. The evaluation framework includes both fully automated and semi-automated procedures to assess the performance of a CA.

RESULTS:

A case study in the domain of mammography shows that LLMs can simulate questions from different patient populations. However, the accuracy of the answers provided varies depending on the level of health literacy.

CONCLUSIONS:

Our scalable evaluation framework enables the simulation of patient populations with different health literacy levels and helps to evaluate domain specific CAs, thus promoting their integration into clinical practice. Future research aims to extend the framework to CAs without predefined content and to apply LLMs to adapt medical information to the specific (health) literacy level of the user.

Assuntos

Algoritmos; Letramento em Saúde; Humanos; Processamento de Linguagem Natural; Mamografia; Relações Médico-Paciente

Palavras-chave

Algorithms; Consumer Health Information; Conversational Agents; Large Language Model; Natural Language Processing

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Letramento em Saúde Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google