Mining core information by evaluating semantic importance for unpaired image captioning.
Neural Netw
; 179: 106519, 2024 Nov.
Article
en En
| MEDLINE
| ID: mdl-39024704
ABSTRACT
Recently, exciting progress has been made in the research of supervised image captioning. However, manually annotated image-annotation pair data is difficult and expensive to obtain. Therefore, unpaired image captioning becomes an emerging challenge. This paper proposes a method called Mining Core Information by Evaluating Semantic Importance (MCIESI) for Unpaired Image Captioning, which is a method for image captioning using unpaired images and sentences. The main difference from the existing methods is that MCIESI focuses on mining the information that should be described in the image and embodies them in the generated natural language that conforms to human thinking. To achieve this goal, we use scene graphs to represent the semantics of images and evaluates the importance of objects and interaction relationships to mine core information in images, which are then encouraged to be embodied in generated sentences through semantic constraint. Combined with grammatical constraint using adversarial training with real sentence corpus and relative constraint using a triplet loss, the generator is trained to generate semantically plausible and grammatically correct sentences. Extensive experiments verify the effectiveness of MCIESI.
Palabras clave
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Asunto principal:
Semántica
/
Procesamiento de Lenguaje Natural
/
Minería de Datos
Límite:
Humans
Idioma:
En
Revista:
Neural Netw
Asunto de la revista:
NEUROLOGIA
Año:
2024
Tipo del documento:
Article