Mining core information by evaluating semantic importance for unpaired image captioning.

Wei, Jiahui; Li, Zhixin; Zhang, Canlong; Ma, Huifang

Wei, Jiahui; Li, Zhixin; Zhang, Canlong; Ma, Huifang.

Afiliación

Wei J; Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004, China; Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin 541004, China. Electronic address: weijh@stu.gxnu.edu.cn.
Li Z; Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004, China; Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin 541004, China. Electronic address: lizx@gxnu.edu.cn.
Zhang C; Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004, China; Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin 541004, China. Electronic address: clzhang@gxnu.edu.cn.
Ma H; College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China. Electronic address: mahuifang@nwnu.edu.cn.

Neural Netw ; 179: 106519, 2024 Nov.

Article en En | MEDLINE | ID: mdl-39024704

ABSTRACT

ABSTRACT

Recently, exciting progress has been made in the research of supervised image captioning. However, manually annotated image-annotation pair data is difficult and expensive to obtain. Therefore, unpaired image captioning becomes an emerging challenge. This paper proposes a method called Mining Core Information by Evaluating Semantic Importance (MCIESI) for Unpaired Image Captioning, which is a method for image captioning using unpaired images and sentences. The main difference from the existing methods is that MCIESI focuses on mining the information that should be described in the image and embodies them in the generated natural language that conforms to human thinking. To achieve this goal, we use scene graphs to represent the semantics of images and evaluates the importance of objects and interaction relationships to mine core information in images, which are then encouraged to be embodied in generated sentences through semantic constraint. Combined with grammatical constraint using adversarial training with real sentence corpus and relative constraint using a triplet loss, the generator is trained to generate semantically plausible and grammatically correct sentences. Extensive experiments verify the effectiveness of MCIESI.

Asunto(s)

Minería de Datos; Procesamiento de Lenguaje Natural; Semántica; Minería de Datos/métodos; Humanos; Redes Neurales de la Computación; Algoritmos; Procesamiento de Imagen Asistido por Computador/métodos

Palabras clave

Generative adversarial training; Mining core information; Transformer; Unpaired image captioning

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Semántica / Procesamiento de Lenguaje Natural / Minería de Datos Límite: Humans Idioma: En Revista: Neural Netw Asunto de la revista: NEUROLOGIA Año: 2024 Tipo del documento: Article

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google