Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering.

Zhang, Liyang; Liu, Shuaicheng; Liu, Donghao; Zeng, Pengpeng; Li, Xiangpeng; Song, Jingkuan; Gao, Lianli

Zhang, Liyang; Liu, Shuaicheng; Liu, Donghao; Zeng, Pengpeng; Li, Xiangpeng; Song, Jingkuan; Gao, Lianli.

IEEE Trans Neural Netw Learn Syst ; 32(10): 4362-4373, 2021 Oct.

Article em En | MEDLINE | ID: mdl-32941156

ABSTRACT

ABSTRACT

Visual question answering (VQA) that involves understanding an image and paired questions develops very quickly with the boost of deep learning in relevant research fields, such as natural language processing and computer vision. Existing works highly rely on the knowledge of the data set. However, some questions require more professional cues other than the data set knowledge to answer questions correctly. To address such an issue, we propose a novel framework named a knowledge-based augmentation network (KAN) for VQA. We introduce object-related open-domain knowledge to assist the question answering. Concretely, we extract more visual information from images and introduce a knowledge graph to provide the necessary common sense or experience for the reasoning process. For these two augmented inputs, we design an attention module that can adjust itself according to the specific questions, such that the importance of external knowledge against detected objects can be balanced adaptively. Extensive experiments show that our KAN achieves state-of-the-art performance on three challenging VQA data sets, i.e., VQA v2, VQA-CP v2, and FVQA. In addition, our open-domain knowledge is also beneficial to VQA baselines. Code is available at https//github.com/yyyanglz/KAN.

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2021 Tipo de documento: Article