Your browser doesn't support javascript.
loading
Comparative Evaluation of Pre-Trained Language Models for Biomedical Information Retrieval.
Weber, Franziska; Toddenroth, Dennis.
Afiliação
  • Weber F; Medical Informatics, University Erlangen-Nuremberg, Germany.
  • Toddenroth D; Medical Informatics, University Erlangen-Nuremberg, Germany.
Stud Health Technol Inform ; 316: 827-831, 2024 Aug 22.
Article em En | MEDLINE | ID: mdl-39176920
ABSTRACT
Finding relevant information in the biomedical literature increasingly depends on efficient information retrieval (IR) algorithms. Cross-Encoders, SentenceBERT, and ColBERT are algorithms based on pre-trained language models that use nuanced but computable vector representations of search queries and documents for IR applications. Here we investigate how well these vectorization algorithms estimate relevance labels of biomedical documents for search queries using the OHSUMED dataset. For our evaluation, we compared computed scores to provided labels by using boxplots and Spearman's rank correlations. According to these metrics, we found that Sentence-BERT moderately outperformed the alternative vectorization algorithms and that additional fine-tuning based on a subset of OHSUMED labels yielded little additional benefit. Future research might aim to develop a larger dedicated dataset in order to optimize such methods more systematically, and to evaluate the corresponding functions in IR tools with end-users.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Processamento de Linguagem Natural / Armazenamento e Recuperação da Informação Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Processamento de Linguagem Natural / Armazenamento e Recuperação da Informação Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article