Evaluation of large language models performance against humans for summarizing MRI knee radiology reports: A feasibility study.

López-Úbeda, Pilar; Martín-Noguerol, Teodoro; Díaz-Angulo, Carolina; Luna, Antonio

López-Úbeda, Pilar; Martín-Noguerol, Teodoro; Díaz-Angulo, Carolina; Luna, Antonio.

Afiliação

López-Úbeda P; Natural Language Processing Unit, Health Time, Jaén, Spain. Electronic address: p.lopez@htmedica.com.
Martín-Noguerol T; MRI Unit, Radiology Department, Health Time, Jaén, Spain. Electronic address: t.martin.f@htmedica.com.
Díaz-Angulo C; MRI Unit, Radiology Department, Health Time, Gijón, Spain. Electronic address: c.diaz@htmedica.com.
Luna A; MRI Unit, Radiology Department, Health Time, Jaén, Spain. Electronic address: aluna70@htmedica.com.

Int J Med Inform ; 187: 105443, 2024 Jul.

Article em En | MEDLINE | ID: mdl-38615509

ABSTRACT

ABSTRACT

OBJECTIVES:

This study addresses the critical need for accurate summarization in radiology by comparing various Large Language Model (LLM)-based approaches for automatic summary generation. With the increasing volume of patient information, accurately and concisely conveying radiological findings becomes crucial for effective clinical decision-making. Minor inaccuracies in summaries can lead to significant consequences, highlighting the need for reliable automated summarization tools.

METHODS:

We employed two language models - Text-to-Text Transfer Transformer (T5) and Bidirectional and Auto-Regressive Transformers (BART) - in both fine-tuned and zero-shot learning scenarios and compared them with a Recurrent Neural Network (RNN). Additionally, we conducted a comparative analysis of 100 MRI report summaries, using expert human judgment and criteria such as coherence, relevance, fluency, and consistency, to evaluate the models against the original radiologist summaries. To facilitate this, we compiled a dataset of 15,508 retrospective knee Magnetic Resonance Imaging (MRI) reports from our Radiology Information System (RIS), focusing on the findings section to predict the radiologist's summary.

RESULTS:

The fine-tuned models outperform the neural network and show superior performance in the zero-shot variant. Specifically, the T5 model achieved a Rouge-L score of 0.638. Based on the radiologist readers' study, the summaries produced by this model were found to be very similar to those produced by a radiologist, with about 70% similarity in fluency and consistency between the T5-generated summaries and the original ones.

CONCLUSIONS:

Technological advances, especially in NLP and LLM, hold great promise for improving and streamlining the summarization of radiological findings, thus providing valuable assistance to radiologists in their work.

Assuntos

Estudos de Viabilidade; Imageamento por Ressonância Magnética; Processamento de Linguagem Natural; Redes Neurais de Computação; Humanos; Sistemas de Informação em Radiologia; Joelho/diagnóstico por imagem; Estudos Retrospectivos

Palavras-chave

Human expert evaluation; Knee MRI reports; Large Language Model; Natural Language Processing; Radiology report summarization

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Imageamento por Ressonância Magnética / Estudos de Viabilidade / Redes Neurais de Computação Limite: Humans Idioma: En Revista: Int J Med Inform Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de publicação: Irlanda

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google