Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
1.
Med Biol Eng Comput ; 2024 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-39358488

RESUMO

Heart failure represents the ultimate stage in the progression of diverse cardiac ailments. Throughout the management of heart failure, physicians require observation of medical imagery to formulate therapeutic regimens for patients. Automated report generation technology serves as a tool aiding physicians in patient management. However, previous studies failed to generate targeted reports for specific diseases. To produce high-quality medical reports with greater relevance across diverse conditions, we introduce an automatic report generation model HF-CMN, tailored to heart failure. Firstly, the generated report includes comprehensive information pertaining to heart failure gleaned from chest radiographs. Additionally, we construct a storage query matrix grouping based on a multi-label type, enhancing the accuracy of our model in aligning images with text. Experimental results demonstrate that our method can generate reports strongly correlated with heart failure and outperforms most other advanced methods on benchmark datasets MIMIC-CXR and IU X-Ray. Further analysis confirms that our method achieves superior alignment between images and texts, resulting in higher-quality reports.

2.
Ann Nucl Med ; 2024 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-39320419

RESUMO

This review explores the potential applications of Large Language Models (LLMs) in nuclear medicine, especially nuclear medicine examinations such as PET and SPECT, reviewing recent advancements in both fields. Despite the rapid adoption of LLMs in various medical specialties, their integration into nuclear medicine has not yet been sufficiently explored. We first discuss the latest developments in nuclear medicine, including new radiopharmaceuticals, imaging techniques, and clinical applications. We then analyze how LLMs are being utilized in radiology, particularly in report generation, image interpretation, and medical education. We highlight the potential of LLMs to enhance nuclear medicine practices, such as improving report structuring, assisting in diagnosis, and facilitating research. However, challenges remain, including the need for improved reliability, explainability, and bias reduction in LLMs. The review also addresses the ethical considerations and potential limitations of AI in healthcare. In conclusion, LLMs have significant potential to transform existing frameworks in nuclear medicine, making it a critical area for future research and development.

3.
Bioengineering (Basel) ; 11(9)2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-39329632

RESUMO

Breast cancer is the most prevalent cancer among women worldwide. B-mode ultrasound (US) is essential for early detection, offering high sensitivity and specificity without radiation exposure. This study introduces a semi-automatic method to streamline breast US report generation, aiming to reduce the burden on radiologists. Our method synthesizes comprehensive breast US reports by combining the extracted information from radiologists' annotations during routine screenings with the analysis results from deep learning algorithms on multimodal US images. Key modules in our method include image classification using visual features (ICVF), type classification via deep learning (TCDL), and automatic report structuring and compilation (ARSC). Experiments showed that the proposed method reduced the average report generation time to 3.8 min compared to manual processes, even when using relatively low-spec hardware. Generated reports perfectly matched ground truth reports for suspicious masses without a single failure on our evaluation datasets. Additionally, the deep-learning-based algorithm, utilizing DenseNet-121 as its core model, achieved an overall accuracy of 0.865, precision of 0.868, recall of 0.847, F1-score of 0.856, and area under the receiver operating characteristics of 0.92 in classifying tissue stiffness in breast US shear-wave elastography (SWE-mode) images. These improvements not only streamline the report generation process but also allow radiologists to dedicate more time and focus on patient care, ultimately enhancing clinical outcomes and patient satisfaction.

4.
Res Sq ; 2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-39257991

RESUMO

Purpose: Radiology report generation, translating radiological images into precise and clinically relevant description, may face the data imbalance challenge - medical tokens appear less frequently than regular tokens; and normal entries are significantly more than abnormal ones. However, very few studies consider the imbalance issues, not even with conjugate imbalance factors. Methods: In this study, we propose a Joint Imbalance Adaptation (JIMA) model to promote task robustness by leveraging token and label imbalance. JIMA predicts entity distributions from images and generates reports based on these distributions and image features. We employ a hard-to-easy learning strategy that mitigates overfitting to frequent labels and tokens, thereby encouraging the model to focus more on rare labels and clinical tokens. Results: JIMA shows notable improvements (16.75% - 50.50% on average) across evaluation metrics on IU X-ray and MIMIC-CXR datasets. Our ablation analysis proves that JIMA's enhanced handling of infrequent tokens and abnormal labels counts the major contribution. Human evaluation and case study experiments further validate that JIMA can generate more clinically accurate reports. Conclusion: Data imbalance (e.g., infrequent tokens and abnormal labels) leads to the underperformance of radiology report generation. Our curriculum learning strategy successfully reduce data imbalance impacts by reducing overfitting on frequent patterns and underfitting on infrequent patterns. While data imbalance remains challenging, our approach opens new directions for the generation task.

5.
J Biomed Inform ; 157: 104718, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39209086

RESUMO

Radiology report generation automates diagnostic narrative synthesis from medical imaging data. Current report generation methods primarily employ knowledge graphs for image enhancement, neglecting the interpretability and guiding function of the knowledge graphs themselves. Additionally, few approaches leverage the stable modal alignment information from multimodal pre-trained models to facilitate the generation of radiology reports. We propose the Terms-Guided Radiology Report Generation (TGR), a simple and practical model for generating reports guided primarily by anatomical terms. Specifically, we utilize a dual-stream visual feature extraction module comprised of detail extraction module and a frozen multimodal pre-trained model to separately extract visual detail features and semantic features. Furthermore, a Visual Enhancement Module (VEM) is proposed to further enrich the visual features, thereby facilitating the generation of a list of anatomical terms. We integrate anatomical terms with image features and proceed to engage contrastive learning with frozen text embeddings, utilizing the stable feature space from these embeddings to boost modal alignment capabilities further. Our model incorporates the capability for manual input, enabling it to generate a list of organs for specifically focused abnormal areas or to produce more accurate single-sentence descriptions based on selected anatomical terms. Comprehensive experiments demonstrate the effectiveness of our method in report generation tasks, our TGR-S model reduces training parameters by 38.9% while performing comparably to current state-of-the-art models, and our TGR-B model exceeds the best baseline models across multiple metrics.


Assuntos
Processamento de Linguagem Natural , Humanos , Radiologia/educação , Radiologia/métodos , Algoritmos , Aprendizado de Máquina , Semântica , Sistemas de Informação em Radiologia , Diagnóstico por Imagem/métodos
6.
Acad Radiol ; 2024 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-39142976

RESUMO

RATIONALE AND OBJECTIVES: The process of generating radiology reports is often time-consuming and labor-intensive, prone to incompleteness, heterogeneity, and errors. By employing natural language processing (NLP)-based techniques, this study explores the potential for enhancing the efficiency of radiology report generation through the remarkable capabilities of ChatGPT (Generative Pre-training Transformer), a prominent large language model (LLM). MATERIALS AND METHODS: Using a sample of 1000 records from the Medical Information Mart for Intensive Care (MIMIC) Chest X-ray Database, this investigation employed Claude.ai to extract initial radiological report keywords. ChatGPT then generated radiology reports using a consistent 3-step prompt template outline. Various lexical and sentence similarity techniques were employed to evaluate the correspondence between the AI assistant-generated reports and reference reports authored by medical professionals. RESULTS: Results showed varying performance among NLP models, with Bart (Bidirectional and Auto-Regressive Transformers) and XLM (Cross-lingual Language Model) displaying high proficiency (mean similarity scores up to 99.3%), closely mirroring physician reports. Conversely, DeBERTa (Decoding-enhanced BERT with disentangled attention) and sequence-matching models scored lower, indicating less alignment with medical language. In the Impression section, the Word-Embedding model excelled with a mean similarity of 84.4%, while others like the Jaccard index showed lower performance. CONCLUSION: Overall, the study highlights significant variations across NLP models in their ability to generate radiology reports consistent with medical professionals' language. Pairwise comparisons and Kruskal-Wallis tests confirmed these differences, emphasizing the need for careful selection and evaluation of NLP models in radiology report generation. This research underscores the potential of ChatGPT to streamline and improve the radiology reporting process, with implications for enhancing efficiency and accuracy in clinical practice.

7.
Med Image Anal ; 97: 103264, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39013207

RESUMO

Natural Image Captioning (NIC) is an interdisciplinary research area that lies within the intersection of Computer Vision (CV) and Natural Language Processing (NLP). Several works have been presented on the subject, ranging from the early template-based approaches to the more recent deep learning-based methods. This paper conducts a survey in the area of NIC, especially focusing on its applications for Medical Image Captioning (MIC) and Diagnostic Captioning (DC) in the field of radiology. A review of the state-of-the-art is conducted summarizing key research works in NIC and DC to provide a wide overview on the subject. These works include existing NIC and MIC models, datasets, evaluation metrics, and previous reviews in the specialized literature. The revised work is thoroughly analyzed and discussed, highlighting the limitations of existing approaches and their potential implications in real clinical practice. Similarly, future potential research lines are outlined on the basis of the detected limitations.


Assuntos
Processamento de Linguagem Natural , Humanos , Sistemas de Informação em Radiologia , Aprendizado Profundo , Diagnóstico por Imagem/métodos , Processamento de Imagem Assistida por Computador/métodos , Interpretação de Imagem Assistida por Computador/métodos
8.
J Comput Biol ; 31(6): 486-497, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38837136

RESUMO

Automatic radiology medical report generation is a necessary development of artificial intelligence technology in the health care. This technology serves to aid doctors in producing comprehensive diagnostic reports, alleviating the burdensome workloads of medical professionals. However, there are some challenges in generating radiological reports: (1) visual and textual data biases and (2) long-distance dependency problem. To tackle these issues, we design a visual recalibration and gating enhancement network (VRGE), which composes of the visual recalibration module and the gating enhancement module (gating enhancement module, GEM). Specifically, the visual recalibration module enhances the recognition of abnormal features in lesion areas of medical images. The GEM dynamically adjusts the contextual information in the report by introducing gating mechanisms, focusing on capturing professional medical terminology in medical text reports. We have conducted sufficient experiments on the public datasets of IU X-Ray to illustrate that the VRGE outperforms existing models.


Assuntos
Inteligência Artificial , Humanos , Radiologia/métodos , Algoritmos
9.
J Imaging Inform Med ; 2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38831189

RESUMO

A radiology report plays a crucial role in guiding patient treatment, but writing these reports is a time-consuming task that demands a radiologist's expertise. In response to this challenge, researchers in the subfields of artificial intelligence for healthcare have explored techniques for automatically interpreting radiographic images and generating free-text reports, while much of the research on medical report creation has focused on image captioning methods without adequately addressing particular report aspects. This study introduces a Conditional Self Attention Memory-Driven Transformer model for generating radiological reports. The model operates in two phases: initially, a multi-label classification model, utilizing ResNet152 v2 as an encoder, is employed for feature extraction and multiple disease diagnosis. In the second phase, the Conditional Self Attention Memory-Driven Transformer serves as a decoder, utilizing self-attention memory-driven transformers to generate text reports. Comprehensive experimentation was conducted to compare existing and proposed techniques based on Bilingual Evaluation Understudy (BLEU) scores ranging from 1 to 4. The model outperforms the other state-of-the-art techniques by increasing the BLEU 1 (0.475), BLEU 2 (0.358), BLEU 3 (0.229), and BLEU 4 (0.165) respectively. This study's findings can alleviate radiologists' workloads and enhance clinical workflows by introducing an autonomous radiological report generation system.

10.
Bioengineering (Basel) ; 11(4)2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38671773

RESUMO

Deep learning is revolutionizing radiology report generation (RRG) with the adoption of vision encoder-decoder (VED) frameworks, which transform radiographs into detailed medical reports. Traditional methods, however, often generate reports of limited diversity and struggle with generalization. Our research introduces reinforcement learning and text augmentation to tackle these issues, significantly improving report quality and variability. By employing RadGraph as a reward metric and innovating in text augmentation, we surpass existing benchmarks like BLEU4, ROUGE-L, F1CheXbert, and RadGraph, setting new standards for report accuracy and diversity on MIMIC-CXR and Open-i datasets. Our VED model achieves F1-scores of 66.2 for CheXbert and 37.8 for RadGraph on the MIMIC-CXR dataset, and 54.7 and 45.6, respectively, on Open-i. These outcomes represent a significant breakthrough in the RRG field. The findings and implementation of the proposed approach, aimed at enhancing diagnostic precision and radiological interpretations in clinical settings, are publicly available on GitHub to encourage further advancements in the field.

11.
Artif Intell Med ; 151: 102846, 2024 05.
Artigo em Inglês | MEDLINE | ID: mdl-38547777

RESUMO

BACKGROUND AND OBJECTIVES: Generating coherent reports from medical images is an important task for reducing doctors' workload. Unlike traditional image captioning tasks, the task of medical image report generation faces more challenges. Current models for generating reports from medical images often fail to characterize some abnormal findings, and some models generate reports with low quality. In this study, we propose a model to generate high-quality reports from medical images. METHODS: In this paper, we propose a model called Hybrid Discriminator Generative Adversarial Network (HDGAN), which combines Generative Adversarial Network (GAN) with Reinforcement Learning (RL). The HDGAN model consists of a generator, a one-sentence discriminator, and a one-word discriminator. Specifically, the RL reward signals are judged on the one-sentence discriminator and one-word discriminator separately. The one-sentence discriminator can better learn sentence-level structural information, while the one-word discriminator can learn word diversity information effectively. RESULTS: Our approach performs better on the IU-X-ray and COV-CTR datasets than the baseline models. For the ROUGE metric, our method outperforms the state-of-the-art model by 0.36 on the IU-X-ray, 0.06 on the MIMIC-CXR and 0.156 on the COV-CTR. CONCLUSIONS: The compositional framework we proposed can generate more accurate medical image reports at different levels.


Assuntos
Aprendizado Profundo , Diagnóstico por Imagem , Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Conjuntos de Dados como Assunto , Diagnóstico por Imagem/métodos , Processamento de Imagem Assistida por Computador/métodos , Radiografia Torácica , Tórax/diagnóstico por imagem , Humanos
12.
Front Radiol ; 4: 1339612, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38426080

RESUMO

Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images. Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists. In this paper, we present a novel multi-modal deep neural network framework for generating chest x-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes. We introduce a conditioned cross-multi-head attention module to fuse these heterogeneous data modalities, bridging the semantic gap between visual and textual data. Experiments demonstrate substantial improvements from using additional modalities compared to relying on images alone. Notably, our model achieves the highest reported performance on the ROUGE-L metric compared to relevant state-of-the-art models in the literature. Furthermore, we employed both human evaluation and clinical semantic similarity measurement alongside word-overlap metrics to improve the depth of quantitative analysis. A human evaluation, conducted by a board-certified radiologist, confirms the model's accuracy in identifying high-level findings, however, it also highlights that more improvement is needed to capture nuanced details and clinical context.

13.
Comput Med Imaging Graph ; 113: 102342, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38309174

RESUMO

Medical image reports are integral to clinical decision-making and patient management. Despite their importance, the confidentiality and private nature of medical data pose significant issues for the sharing and analysis of medical image data. This paper addresses these concerns by introducing a multimodal federated learning-based methodology for medical image reporting. This methodology harnesses distributed computing for co-training models across various medical institutions. Under the federated learning framework, every medical institution is capable of training the model locally and aggregating the updated model parameters to curate a top-tier medical image report model. Initially, we advocate for an architecture facilitating multimodal federated learning, including model creation, parameter consolidation, and algorithm enhancement steps. In the model selection phase, we introduce a deep learning-based strategy that utilizes multimodal data for training to produce medical image reports. In the parameter aggregation phase, the federal average algorithm is applied to amalgamate model parameters trained by each institution, which leads to a comprehensive global model. In addition, we introduce an evidence-based optimization algorithm built upon the federal average algorithm. The efficacy of the proposed architecture and scheme is showcased through a series of experiments. Our experimental results validate the proficiency of the proposed multimodal federated learning approach in generating medical image reports. Compared to conventional centralized learning methods, our proposal not only enhances the protection of patient confidentiality but also enriches the accuracy and overall quality of medical image reports. Through this research, we offer a novel solution for the privacy issues linked with the sharing and analyzing of medical data. Expected to assume a crucial role in medical image report generation and other medical applications, the multimodal federated learning method is set to deliver more precise, efficient, and privacy-secured medical services for healthcare professionals and patients.


Assuntos
Algoritmos , Prontuários Médicos , Humanos
14.
Phys Med Biol ; 69(6)2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38373345

RESUMO

Objective.Generally, due to a lack of explainability, radiomics based on deep learning has been perceived as a black-box solution for radiologists. Automatic generation of diagnostic reports is a semantic approach to enhance the explanation of deep learning radiomics (DLR).Approach.In this paper, we propose a novel model called radiomics-reporting network (Radioport), which incorporates text attention. This model aims to improve the interpretability of DLR in mammographic calcification diagnosis. Firstly, it employs convolutional neural networks to extract visual features as radiomics for multi-category classification based on breast imaging reporting and data system. Then, it builds a mapping between these visual features and textual features to generate diagnostic reports, incorporating an attention module for improved clarity.Main results.To demonstrate the effectiveness of our proposed model, we conducted experiments on a breast calcification dataset comprising mammograms and diagnostic reports. The results demonstrate that our model can: (i) semantically enhance the interpretability of DLR; and, (ii) improve the readability of generated medical reports.Significance.Our interpretable textual model can explicitly simulate the mammographic calcification diagnosis process.


Assuntos
Aprendizado Profundo , Radiômica , Redes Neurais de Computação , Mamografia/métodos , Relatório de Pesquisa
15.
JMIR Form Res ; 8: e32690, 2024 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-38329788

RESUMO

BACKGROUND: The automatic generation of radiology reports, which seeks to create a free-text description from a clinical radiograph, is emerging as a pivotal intersection between clinical medicine and artificial intelligence. Leveraging natural language processing technologies can accelerate report creation, enhancing health care quality and standardization. However, most existing studies have not yet fully tapped into the combined potential of advanced language and vision models. OBJECTIVE: The purpose of this study was to explore the integration of pretrained vision-language models into radiology report generation. This would enable the vision-language model to automatically convert clinical images into high-quality textual reports. METHODS: In our research, we introduced a radiology report generation model named ClinicalBLIP, building upon the foundational InstructBLIP model and refining it using clinical image-to-text data sets. A multistage fine-tuning approach via low-rank adaptation was proposed to deepen the semantic comprehension of the visual encoder and the large language model for clinical imagery. Furthermore, prior knowledge was integrated through prompt learning to enhance the precision of the reports generated. Experiments were conducted on both the IU X-RAY and MIMIC-CXR data sets, with ClinicalBLIP compared to several leading methods. RESULTS: Experimental results revealed that ClinicalBLIP obtained superior scores of 0.570/0.365 and 0.534/0.313 on the IU X-RAY/MIMIC-CXR test sets for the Metric for Evaluation of Translation with Explicit Ordering (METEOR) and the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) evaluations, respectively. This performance notably surpasses that of existing state-of-the-art methods. Further evaluations confirmed the effectiveness of the multistage fine-tuning and the integration of prior information, leading to substantial improvements. CONCLUSIONS: The proposed ClinicalBLIP model demonstrated robustness and effectiveness in enhancing clinical radiology report generation, suggesting significant promise for real-world clinical applications.

16.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi ; 41(1): 60-69, 2024 Feb 25.
Artigo em Chinês | MEDLINE | ID: mdl-38403605

RESUMO

The task of automatic generation of medical image reports faces various challenges, such as diverse types of diseases and a lack of professionalism and fluency in report descriptions. To address these issues, this paper proposes a multimodal medical imaging report based on memory drive method (mMIRmd). Firstly, a hierarchical vision transformer using shifted windows (Swin-Transformer) is utilized to extract multi-perspective visual features of patient medical images, and semantic features of textual medical history information are extracted using bidirectional encoder representations from transformers (BERT). Subsequently, the visual and semantic features are integrated to enhance the model's ability to recognize different disease types. Furthermore, a medical text pre-trained word vector dictionary is employed to encode labels of visual features, thereby enhancing the professionalism of the generated reports. Finally, a memory driven module is introduced in the decoder, addressing long-distance dependencies in medical image data. This study is validated on the chest X-ray dataset collected at Indiana University (IU X-Ray) and the medical information mart for intensive care chest x-ray (MIMIC-CXR) released by the Massachusetts Institute of Technology and Massachusetts General Hospital. Experimental results indicate that the proposed method can better focus on the affected areas, improve the accuracy and fluency of report generation, and assist radiologists in quickly completing medical image report writing.


Assuntos
Cuidados Críticos , Fontes de Energia Elétrica , Humanos , Semântica , Tecnologia
17.
Comput Methods Programs Biomed ; 244: 107979, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38113805

RESUMO

BACKGROUND AND OBJECTIVES: The automatic generation of medical image diagnostic reports can assist doctors in reducing their workload and improving the efficiency and accuracy of diagnosis. However, among the most existing report generation models, there are problems that the weak correlation between generated words and the lack of contextual information in the report generation process. METHODS: To address the above problems, we propose an Attention-Enhanced Relational Memory Network (AERMNet) model, where the relational memory module is continuously updated by the words generated in the previous time step to strengthen the correlation between words in generated medical image report. And the double LSTM with interaction module reduces the loss of context information and makes full use of feature information. Thus, more accurate disease information can be generated by AERMNet for medical image reports. RESULTS: Experimental results on four medical datasets Fetal heart (FH), Ultrasound, IU X-Ray and MIMIC-CXR, show that our proposed method outperforms some of the previous models with respect to language generation metrics (Cider improving by 2.4% on FH, Bleu1 improving by 2.4% on Ultrasound, Cider improving by 16.4% on IU X-Ray, Bleu2 improving by 9.7% on MIMIC-CXR). CONCLUSIONS: This work promotes the development of medical image report generation and expands the prospects of computer-aided diagnosis applications. Our code is released at https://github.com/llttxx/AERMNET.


Assuntos
Benchmarking , Médicos , Humanos , Diagnóstico por Computador , Idioma , Prontuários Médicos , Processamento de Imagem Assistida por Computador
18.
Comput Med Imaging Graph ; 111: 102320, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38134726

RESUMO

Medical imaging, specifically chest X-ray image analysis, is a crucial component of early disease detection and screening in healthcare. Deep learning techniques, such as convolutional neural networks (CNNs), have emerged as powerful tools for computer-aided diagnosis (CAD) in chest X-ray image analysis. These techniques have shown promising results in automating tasks such as classification, detection, and segmentation of abnormalities in chest X-ray images, with the potential to surpass human radiologists. In this review, we provide an overview of the importance of chest X-ray image analysis, historical developments, impact of deep learning techniques, and availability of labeled databases. We specifically focus on advancements and challenges in radiology report generation using deep learning, highlighting potential future advancements in this area. The use of deep learning for report generation has the potential to reduce the burden on radiologists, improve patient care, and enhance the accuracy and efficiency of chest X-ray image analysis in medical imaging.


Assuntos
Aprendizado Profundo , Humanos , Raios X , Redes Neurais de Computação , Tórax , Diagnóstico por Computador/métodos
19.
Med Image Anal ; 91: 102982, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37837692

RESUMO

Medical report generation can be treated as a process of doctors' observing, understanding, and describing images from different perspectives. Following this process, this paper innovatively proposes a Transformer-based Semantic Query learning paradigm (TranSQ). Briefly, this paradigm is to learn an intention embedding set and make a semantic query to the visual features, generate intent-compliant sentence candidates, and form a coherent report. We apply a bipartite matching mechanism during training to realize the dynamic correspondence between the intention embeddings and the sentences to induct medical concepts into the observation intentions. Experimental results on two major radiology reporting datasets (i.e., IU X-ray and MIMIC-CXR) demonstrate that our model outperforms state-of-the-art models regarding generation effectiveness and clinical efficacy. In addition, comprehensive ablation experiments fully validate the TranSQ model's innovation and interpretation. The code is available at https://github.com/zjukongming/TranSQ.


Assuntos
Aprendizagem , Semântica , Humanos , Raios X , Radiografia , Lógica
20.
Phys Med Biol ; 69(4)2024 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-38157546

RESUMO

Objective.Automatic radiology report generation is booming due to its huge application potential for the healthcare industry. However, existing computer vision and natural language processing approaches to tackle this problem are limited in two aspects. First, when extracting image features, most of them neglect multi-view reasoning in vision and model single-view structure of medical images, such as space-view or channel-view. However, clinicians rely on multi-view imaging information for comprehensive judgment in daily clinical diagnosis. Second, when generating reports, they overlook context reasoning with multi-modal information and focus on pure textual optimization utilizing retrieval-based methods. We aim to address these two issues by proposing a model that better simulates clinicians perspectives and generates more accurate reports.Approach.Given the above limitation in feature extraction, we propose a globally-intensive attention (GIA) module in the medical image encoder to simulate and integrate multi-view vision perception. GIA aims to learn three types of vision perception: depth view, space view, and pixel view. On the other hand, to address the above problem in report generation, we explore how to involve multi-modal signals to generate precisely matched reports, i.e. how to integrate previously predicted words with region-aware visual content in next word prediction. Specifically, we design a visual knowledge-guided decoder (VKGD), which can adaptively consider how much the model needs to rely on visual information and previously predicted text to assist next word prediction. Hence, our final intensive vision-guided network framework includes a GIA-guided visual encoder and the VKGD.Main results.Experiments on two commonly-used datasets IU X-RAY and MIMIC-CXR demonstrate the superior ability of our method compared with other state-of-the-art approaches.Significance.Our model explores the potential of simulating clinicians perspectives and automatically generates more accurate reports, which promotes the exploration of medical automation and intelligence.


Assuntos
Radiologia , Radiografia , Percepção Visual , Automação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA