Interactive dual-stream contrastive learning for radiology report generation.

Zhang, Ziqi; Jiang, Ailian

Zhang, Ziqi; Jiang, Ailian.

Afiliação

Zhang Z; College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030600, China.
Jiang A; College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030600, China. Electronic address: ailianjiang@126.com.

J Biomed Inform ; 157: 104718, 2024 Sep.

Article em En | MEDLINE | ID: mdl-39209086

ABSTRACT

ABSTRACT

Radiology report generation automates diagnostic narrative synthesis from medical imaging data. Current report generation methods primarily employ knowledge graphs for image enhancement, neglecting the interpretability and guiding function of the knowledge graphs themselves. Additionally, few approaches leverage the stable modal alignment information from multimodal pre-trained models to facilitate the generation of radiology reports. We propose the Terms-Guided Radiology Report Generation (TGR), a simple and practical model for generating reports guided primarily by anatomical terms. Specifically, we utilize a dual-stream visual feature extraction module comprised of detail extraction module and a frozen multimodal pre-trained model to separately extract visual detail features and semantic features. Furthermore, a Visual Enhancement Module (VEM) is proposed to further enrich the visual features, thereby facilitating the generation of a list of anatomical terms. We integrate anatomical terms with image features and proceed to engage contrastive learning with frozen text embeddings, utilizing the stable feature space from these embeddings to boost modal alignment capabilities further. Our model incorporates the capability for manual input, enabling it to generate a list of organs for specifically focused abnormal areas or to produce more accurate single-sentence descriptions based on selected anatomical terms. Comprehensive experiments demonstrate the effectiveness of our method in report generation tasks, our TGR-S model reduces training parameters by 38.9% while performing comparably to current state-of-the-art models, and our TGR-B model exceeds the best baseline models across multiple metrics.

Assuntos

Processamento de Linguagem Natural; Humanos; Radiologia/educação; Radiologia/métodos; Algoritmos; Aprendizado de Máquina; Semântica; Sistemas de Informação em Radiologia; Diagnóstico por Imagem/métodos

Palavras-chave

Anatomical term; Contrastive learning; Multimodal pre-trained model; Radiology report generation

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural Limite: Humans Idioma: En Revista: J Biomed Inform Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google