NMN-VD: A Neural Module Network for Visual Dialog.

Cho, Yeongsu; Kim, Incheol

Cho, Yeongsu; Kim, Incheol.

Afiliação

Cho Y; Department of Computer Science, Kyonggi University, Suwon 16227, Korea.
Kim I; Department of Computer Science, Kyonggi University, Suwon 16227, Korea.

Sensors (Basel) ; 21(3)2021 Jan 30.

Article em En | MEDLINE | ID: mdl-33573265

ABSTRACT

ABSTRACT

Visual dialog demonstrates several important aspects of multimodal artificial intelligence; however, it is hindered by visual grounding and visual coreference resolution problems. To overcome these problems, we propose the novel neural module network for visual dialog (NMN-VD). NMN-VD is an efficient question-customized modular network model that combines only the modules required for deciding answers after analyzing input questions. In particular, the model includes a Refer module that effectively finds the visual area indicated by a pronoun using a reference pool to solve a visual coreference resolution problem, which is an important challenge in visual dialog. In addition, the proposed NMN-VD model includes a method for distinguishing and handling impersonal pronouns that do not require visual coreference resolution from general pronouns. Furthermore, a new Compare module that effectively handles comparison questions found in visual dialogs is included in the model, as well as a Find module that applies a triple-attention mechanism to solve visual grounding problems between the question and the image. The results of various experiments conducted using a set of large-scale benchmark data verify the efficacy and high performance of our proposed NMN-VD model.

Palavras-chave

attention mechanism; neural module network; visual coreference resolution; visual dialog

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Sensors (Basel) Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Sensors (Basel) Ano de publicação: 2021 Tipo de documento: Article