Your browser doesn't support javascript.
loading
Multi-grained visual pivot-guided multi-modal neural machine translation with text-aware cross-modal contrastive disentangling.
Guo, Junjun; Su, Rui; Ye, Junjie.
Afiliação
  • Guo J; Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, 650500, China; Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, 650500, China. Electronic address: guojjgb@163.com.
  • Su R; Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, 650500, China; Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, 650500, China.
  • Ye J; School of Information Science & Engineering, Yunnan University, Kunming, Yunnan, 650221, China. Electronic address: yejunjie_cdx@163.com.
Neural Netw ; 178: 106403, 2024 Oct.
Article em En | MEDLINE | ID: mdl-38815470
ABSTRACT
The goal of multi-modal neural machine translation (MNMT) is to incorporate language-agnostic visual information into text to enhance the performance of machine translation. However, due to the inherent differences between image and text, these two modalities inevitably suffer from semantic mismatch problems. To tackle this issue, this paper adopts a multi-grained visual pivot-guided multi-modal fusion strategy with cross-modal contrastive disentangling to eliminate the linguistic gaps between different languages. By using the disentangled multi-grained visual information as a cross-lingual pivot, we can enhance the alignment between different languages and improve the performance of MNMT. We first introduce text-guided stacked cross-modal disentangling modules to progressively disentangle image into two types of visual information MT-related visual and background information. Then we effectively integrate these two kinds of multi-grained visual elements to assist target sentence generation. Extensive experiments on four benchmark MNMT datasets are conducted, and the results demonstrate that our proposed approach achieves significant improvement over the other state-of-the-art (SOTA) approaches on all test sets. The in-depth analysis highlights the benefits of text-guided cross-modal disentangling and visual pivot-based multi-modal fusion strategies in MNMT. We release the code at https//github.com/nlp-mnmt/ConVisPiv-MNMT.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Redes Neurais de Computação Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Redes Neurais de Computação Idioma: En Ano de publicação: 2024 Tipo de documento: Article