Your browser doesn't support javascript.
loading
Token-Mixer: Bind Image and Text in One Embedding Space for Medical Image Reporting.
IEEE Trans Med Imaging ; PP2024 Jun 11.
Article en En | MEDLINE | ID: mdl-38861436
ABSTRACT
Medical image reporting focused on automatically generating the diagnostic reports from medical images has garnered growing research attention. In this task, learning cross-modal alignment between images and reports is crucial. However, the exposure bias problem in autoregressive text generation poses a notable challenge, as the model is optimized by a word-level loss function using the teacher-forcing strategy. To this end, we propose a novel Token-Mixer framework that learns to bind image and text in one embedding space for medical image reporting. Concretely, Token-Mixer enhances the cross-modal alignment by matching image-to-text generation with text-to-text generation that suffers less from exposure bias. The framework contains an image encoder, a text encoder and a text decoder. In training, images and paired reports are first encoded into image tokens and text tokens, and these tokens are randomly mixed to form the mixed tokens. Then, the text decoder accepts image tokens, text tokens or mixed tokens as prompt tokens and conducts text generation for network optimization. Furthermore, we introduce a tailored text decoder and an alternative training strategy that well integrate with our Token-Mixer framework. Extensive experiments across three publicly available datasets demonstrate Token-Mixer successfully enhances the image-text alignment and thereby attains a state-of-the-art performance. Related codes are available at https//github.com/yangyan22/Token-Mixer.

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: IEEE Trans Med Imaging Año: 2024 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: IEEE Trans Med Imaging Año: 2024 Tipo del documento: Article
...