Búsqueda | Portal de Búsqueda de la BVS

HyperSOR: Context-Aware Graph Hypernetwork for Salient Object Ranking.

Qiao, Minglang; Xu, Mai; Jiang, Lai; Lei, Peng; Wen, Shijie; Chen, Yunjin; Sigal, Leonid.

IEEE Trans Pattern Anal Mach Intell ; 46(9): 5873-5889, 2024 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-38381637

RESUMEN

Salient object ranking (SOR) aims to segment salient objects in an image and simultaneously predict their saliency rankings, according to the shifted human attention over different objects. The existing SOR approaches mainly focus on object-based attention, e.g., the semantic and appearance of object. However, we find that the scene context plays a vital role in SOR, in which the saliency ranking of the same object varies a lot at different scenes. In this paper, we thus make the first attempt towards explicitly learning scene context for SOR. Specifically, we establish a large-scale SOR dataset of 24,373 images with rich context annotations, i.e., scene graphs, segmentation, and saliency rankings. Inspired by the data analysis on our dataset, we propose a novel graph hypernetwork, named HyperSOR, for context-aware SOR. In HyperSOR, an initial graph module is developed to segment objects and construct an initial graph by considering both geometry and semantic information. Then, a scene graph generation module with multi-path graph attention mechanism is designed to learn semantic relationships among objects based on the initial graph. Finally, a saliency ranking prediction module dynamically adopts the learned scene context through a novel graph hypernetwork, for inferring the saliency rankings. Experimental results show that our HyperSOR can significantly improve the performance of SOR.

Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach.

Xu, Mai; Song, Yuhang; Wang, Jianyi; Qiao, Minglang; Huo, Liangyu; Wang, Zulin.

IEEE Trans Pattern Anal Mach Intell ; 41(11): 2693-2708, 2019 11.

Artículo en Inglés | MEDLINE | ID: mdl-30047871

RESUMEN

Panoramic video provides immersive and interactive experience by enabling humans to control the field of view (FoV) through head movement (HM). Thus, HM plays a key role in modeling human attention on panoramic video. This paper establishes a database collecting subjects' HM in panoramic video sequences. From this database, we find that the HM data are highly consistent across subjects. Furthermore, we find that deep reinforcement learning (DRL) can be applied to predict HM positions, via maximizing the reward of imitating human HM scanpaths through the agent's actions. Based on our findings, we propose a DRL-based HM prediction (DHP) approach with offline and online versions, called offline-DHP and online-DHP. In offline-DHP, multiple DRL workflows are run to determine potential HM positions at each panoramic frame. Then, a heat map of the potential HM positions, named the HM map, is generated as the output of offline-DHP. In online-DHP, the next HM position of one subject is estimated given the currently observed HM position, which is achieved by developing a DRL algorithm upon the learned offline-DHP model. Finally, the experiments validate that our approach is effective in both offline and online prediction of HM positions for panoramic video, and that the learned offline-DHP model can improve the performance of online-DHP.

Asunto(s)

Aprendizaje Profundo , Movimientos de la Cabeza/fisiología , Procesamiento de Imagen Asistido por Computador/métodos , Modelos Estadísticos , Grabación en Video , Adolescente , Adulto , Algoritmos , Femenino , Humanos , Masculino , Adulto Joven

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA