Cascaded Parsing of Human-Object Interaction Recognition.
IEEE Trans Pattern Anal Mach Intell
; 44(6): 2827-2840, 2022 06.
Article
em En
| MEDLINE
| ID: mdl-33400648
This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images. Considering the intrinsic complexity and structural nature of the task, we introduce a cascaded parsing network (CP-HOI) for a multi-stage, structured HOI understanding. At each cascade stage, an instance detection module progressively refines HOI proposals and feeds them into a structured interaction reasoning module. Each of the two modules is also connected to its predecessor in the previous stage, enabling efficient cross-stage information propagation. The structured interaction reasoning module is built upon a graph parsing neural network (GPNN), which efficiently models potential HOI structures as graphs and mines rich context for comprehensive relation understanding. In particular, GPNN infers a parse graph that i) interprets meaningful HOI structures by a learnable adjacency matrix, and ii) predicts action (edge) labels. Within an end-to-end, message-passing framework, GPNN blends learning and inference, iteratively parsing HOI structures and reasoning HOI representations (i.e., instance and relation features). Further beyond relation detection at a bounding-box level, we make our framework flexible to perform fine-grained pixel-wise relation segmentation; this provides a new glimpse into better relation modeling. A preliminary version of our CP-HOI model reached 1st place in the ICCV2019 Person in Context Challenge, on both relation detection and segmentation. In addition, our CP-HOI shows promising results on two popular HOI recognition benchmarks, i.e., V-COCO and HICO-DET.
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Algoritmos
/
Redes Neurais de Computação
Idioma:
En
Ano de publicação:
2022
Tipo de documento:
Article