Comparative Evaluation of Pre-Trained Language Models for Biomedical Information Retrieval.

Weber, Franziska; Toddenroth, Dennis

Weber, Franziska; Toddenroth, Dennis.

Afiliação

Weber F; Medical Informatics, University Erlangen-Nuremberg, Germany.
Toddenroth D; Medical Informatics, University Erlangen-Nuremberg, Germany.

Stud Health Technol Inform ; 316: 827-831, 2024 Aug 22.

Article em En | MEDLINE | ID: mdl-39176920

ABSTRACT

ABSTRACT

Finding relevant information in the biomedical literature increasingly depends on efficient information retrieval (IR) algorithms. Cross-Encoders, SentenceBERT, and ColBERT are algorithms based on pre-trained language models that use nuanced but computable vector representations of search queries and documents for IR applications. Here we investigate how well these vectorization algorithms estimate relevance labels of biomedical documents for search queries using the OHSUMED dataset. For our evaluation, we compared computed scores to provided labels by using boxplots and Spearman's rank correlations. According to these metrics, we found that Sentence-BERT moderately outperformed the alternative vectorization algorithms and that additional fine-tuning based on a subset of OHSUMED labels yielded little additional benefit. Future research might aim to develop a larger dedicated dataset in order to optimize such methods more systematically, and to evaluate the corresponding functions in IR tools with end-users.

Assuntos

Algoritmos; Armazenamento e Recuperação da Informação; Processamento de Linguagem Natural; Armazenamento e Recuperação da Informação/métodos; Humanos

Palavras-chave

Biomedical Information Retrieval; ColBERT; Cross-Encoder; OHSUMED; Sentence-BERT

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Processamento de Linguagem Natural / Armazenamento e Recuperação da Informação Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google