Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation.
Sensors (Basel)
; 21(9)2021 May 02.
Article
en En
| MEDLINE
| ID: mdl-34063299
Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task. Segment-based and sliding-window based methods have been proposed to perform this task. However, they all have certain limitations. This study proposes a novel deep neural network model called VSGG-Net for video scene graph generation. The model uses a sliding window scheme to detect object tracklets of various lengths throughout the entire video. In particular, the proposed model presents a new tracklet pair proposal method that evaluates the relatedness of object tracklet pairs using a pretrained neural network and statistical information. To effectively utilize the spatio-temporal context, low-level visual context reasoning is performed using a spatio-temporal context graph and a graph neural network as well as high-level semantic context reasoning. To improve the detection performance for sparse relationships, the proposed model applies a class weighting technique that adjusts the weight of sparse relationships to a higher level. This study demonstrates the positive effect and high performance of the proposed model through experiments using the benchmark dataset VidOR and VidVRD.
Texto completo:
1
Colección:
01-internacional
Banco de datos:
MEDLINE
Idioma:
En
Revista:
Sensors (Basel)
Año:
2021
Tipo del documento:
Article