An error analysis for image-based multi-modal neural machine translation.

Calixto, Iacer; Liu, Qun

Calixto, Iacer; Liu, Qun.

Afiliação

Calixto I; 1University of Amsterdam, ILLC, Science Park, Amsterdam, Netherlands.
Liu Q; Huawei Noah's Ark Lab, Hong Kong, Hong Kong.

Mach Transl ; 33(1): 155-177, 2019.

Article em En | MEDLINE | ID: mdl-31281206

ABSTRACT

ABSTRACT

In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain training data set of parallel sentence pairs with images. We analyse two different types of MNMT models, that use global and local image features the latter encode an image globally, i.e. there is one feature vector representing an entire image, whereas the former encode spatial information, i.e. there are multiple feature vectors, each encoding different portions of the image. We conduct an error analysis of translations generated by different MNMT models as well as text-only baselines, where we study how multi-modal models compare when translating both visual and non-visual terms. In general, we find that the additional multi-modal signals consistently improve translations, even more so when using simpler MNMT models that use global visual features. We also find that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi-modal models.

Palavras-chave

Error analysis; Machine translation; Multi-modal machine translation; Multi-modal neural machine translation; Neural machine translation

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2019 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links