Pesquisa | Secretaria de Estado da Saúde

Ranking-Based Salient Object Detection and Depth Prediction for Shallow Depth-of-Field.

Xian, Ke; Peng, Juewen; Zhang, Chao; Lu, Hao; Cao, Zhiguo.

Sensors (Basel) ; 21(5)2021 Mar 05.

Artigo em Inglês | MEDLINE | ID: mdl-33807770

RESUMO

Shallow depth-of-field (DoF), focusing on the region of interest by blurring out the rest of the image, is challenging in computer vision and computational photography. It can be achieved either by adjusting the parameters (e.g., aperture and focal length) of a single-lens reflex camera or computational techniques. In this paper, we investigate the latter one, i.e., explore a computational method to render shallow DoF. The previous methods either rely on portrait segmentation or stereo sensing, which can only be applied to portrait photos and require stereo inputs. To address these issues, we study the problem of rendering shallow DoF from an arbitrary image. In particular, we propose a method that consists of a salient object detection (SOD) module, a monocular depth prediction (MDP) module, and a DoF rendering module. The SOD module determines the focal plane, while the MDP module controls the blur degree. Specifically, we introduce a label-guided ranking loss for both salient object detection and depth prediction. For salient object detection, the label-guided ranking loss comprises two terms: (i) heterogeneous ranking loss that encourages the sampled salient pixels to be different from background pixels; (ii) homogeneous ranking loss penalizes the inconsistency of salient pixels or background pixels. For depth prediction, the label-guided ranking loss mainly relies on multilevel structural information, i.e., from low-level edge maps to high-level object instance masks. In addition, we introduce a SOD and depth-aware blur rendering method to generate shallow DoF images. Comprehensive experiments demonstrate the effectiveness of our proposed method.

NVDS ⁺: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation.

Wang, Yiran; Shi, Min; Li, Jiaqi; Hong, Chaoyi; Huang, Zihao; Peng, Juewen; Cao, Zhiguo; Zhang, Jianming; Xian, Ke; Lin, Guosheng.

IEEE Trans Pattern Anal Mach Intell ; PP2024 Oct 08.

Artigo em Inglês | MEDLINE | ID: mdl-39378259

RESUMO

Video depth estimation aims to infer temporally consistent depth. One approach is to finetune a single-image model on each video with geometry constraints, which proves inefficient and lacks robustness. An alternative is learning to enforce consistency from data, which requires well-designed models and sufficient video depth data. To address both challenges, we introduce NVDS + that stabilizes inconsistent depth estimated by various single-image models in a plug-and-play manner. We also elaborate a large-scale Video Depth in the Wild (VDW) dataset, which contains 14,203 videos with over two million frames, making it the largest natural-scene video depth dataset. Additionally, a bidirectional inference strategy is designed to improve consistency by adaptively fusing forward and backward predictions. We instantiate a model family ranging from small to large scales for different applications. The method is evaluated on VDW dataset and three public benchmarks. To further prove the versatility, we extend NVDS + to video semantic segmentation and several downstream applications like bokeh rendering, novel view synthesis, and 3D reconstruction. Experimental results show that our method achieves significant improvements in consistency, accuracy, and efficiency. Our work serves as a solid baseline and data foundation for learning-based video depth estimation. Code and dataset are available at: https://github.com/RaymondWang987/NVDS.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa