Graph-based social relation inference with multi-level conditional attention.
Neural Netw
; 173: 106216, 2024 May.
Article
in En
| MEDLINE
| ID: mdl-38442650
ABSTRACT
Social relation inference intrinsically requires high-level semantic understanding. In order to accurately infer relations of persons in images, one needs not only to understand scenes and objects in images, but also to adaptively attend to important clues. Unlike prior works of classifying social relations using attention on detected objects, we propose a MUlti-level Conditional Attention (MUCA) mechanism for social relation inference, which attends to scenes, objects and human interactions based on each person pair. Then, we develop a transformer-style network to achieve the MUCA mechanism. The novel network named as Graph-based Relation Inference Transformer (i.e., GRIT) consists of two modules, i.e., a Conditional Query Module (CQM) and a Relation Attention Module (RAM). Specifically, we design a graph-based CQM to generate informative relation queries for all person pairs, which fuses local features and global context for each person pair. Moreover, we fully take advantage of transformer-style networks in RAM for multi-level attentions in classifying social relations. To our best knowledge, GRIT is the first for inferring social relations with multi-level conditional attention. GRIT is end-to-end trainable and significantly outperforms existing methods on two benchmark datasets, e.g., with performance improvement of 7.8% on PIPA and 9.6% on PISC.
Key words
Full text:
1
Collection:
01-internacional
Database:
MEDLINE
Main subject:
Knowledge
/
Benchmarking
Limits:
Humans
Language:
En
Journal:
Neural Netw
/
Neural netw
/
Neural networks
Journal subject:
NEUROLOGIA
Year:
2024
Document type:
Article
Country of publication:
Estados Unidos