Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Image Process ; 32: 6426-6440, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37966926

RESUMO

The increasing demand for immersive experience has greatly promoted the quality assessment research of Light Field Image (LFI). In this paper, we propose an efficient deep discrepancy measuring framework for full-reference light field image quality assessment. The main idea of the proposed framework is to efficiently evaluate the quality degradation of distorted LFIs by measuring the discrepancy between reference and distorted LFI patches. Firstly, a patch generation module is proposed to extract spatio-angular patches and sub-aperture patches from LFIs, which greatly reduces the computational cost. Then, we design a hierarchical discrepancy network based on convolutional neural networks to extract the hierarchical discrepancy features between reference and distorted spatio-angular patches. Besides, the local discrepancy features between reference and distorted sub-aperture patches are extracted as complementary features. After that, the angular-dominant hierarchical discrepancy features and the spatial-dominant local discrepancy features are combined to evaluate the patch quality. Finally, the quality of all patches is pooled to obtain the overall quality of distorted LFIs. To the best of our knowledge, the proposed framework is the first patch-based full-reference light field image quality assessment metric based on deep-learning technology. Experimental results on four representative LFI datasets show that our proposed framework achieves superior performance as well as lower computational complexity compared to other state-of-the-art metrics.

2.
IEEE Trans Image Process ; 32: 5031-5045, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37347635

RESUMO

Semantic segmentation assigns a category for each pixel and has achieved great success in a supervised manner. However, it fails to generalize well in new domains due to the domain gap. Domain adaptation is a popular way to solve this issue, but it needs target data and cannot handle unavailable domains. In domain generalization (DG), the model is trained without the target data and DG aims to generalize well in new unavailable domains. Recent works reveal that shape recognition is beneficial for generalization but still lack exploration in semantic segmentation. Meanwhile, the object shapes also exist a discrepancy in different domains, which is often ignored by the existing works. Thus, we propose a Shape-Invariant Learning (SIL) framework to focus on learning shape-invariant representation for better generalization. Specifically, we first define the structural edge, which considers both the object boundary and the inner structure of the object to provide more discrimination cues. Then, a shape perception learning strategy including a texture feature discrepancy reduction loss and a structural feature discrepancy enlargement loss is proposed to enhance the shape perception ability of the model by embedding the structural edge as a shape prior. Finally, we use shape deformation augmentation to generate samples with the same content and different shapes. Essentially, our SIL framework performs implicit shape distribution alignment at the domain-level to learn shape-invariant representation. Extensive experiments show that our SIL framework achieves state-of-the-art performance.

3.
Artigo em Inglês | MEDLINE | ID: mdl-34252028

RESUMO

Independent mobility poses a great challenge to the visually impaired individuals. This paper proposes a novel system to understand dynamic crosswalk scenes, which detects the key objects, such as crosswalk, vehicle, and pedestrian, and identifies pedestrian traffic light status. The indication of where and when to cross the road is provided to the visually impaired based on the crosswalk scene understanding. Our proposed system is implemented on a head-mounted mobile device (SensingAI G1) equipped with an Intel RealSense camera and a cellphone, and provides surrounding scene information to visually impaired individuals through audio signal. To validate the performance of the proposed system, we propose a crosswalk scene understanding dataset which contains three sub-datasets: a pedestrian traffic light dataset with 7447 images, a dataset of key objects on the crossroad with 1006 images and a crosswalk dataset with 3336 images. Extensive experiments demonstrated that the proposed system was robust and outperformed the state-of-the-art approaches. The experiment conducted with the visually impaired subjects shows that the system is practical useful.


Assuntos
Pedestres , Pessoas com Deficiência Visual , Acidentes de Trânsito , Humanos
4.
IEEE Trans Image Process ; 30: 4084-4098, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33819155

RESUMO

Current state-of-the-art two-stage detectors heavily rely on region proposals to guide the accurate detection for objects. In previous region proposal approaches, the interaction between different functional modules is correlated weakly, which limits or decreases the performance of region proposal approaches. In this paper, we propose a novel two-stage strong correlation learning framework, abbreviated as SC-RPN, which aims to set up stronger relationship among different modules in the region proposal task. Firstly, we propose a Light-weight IoU-Mask branch to predict intersection-over-union (IoU) mask and refine region classification scores as well, it is used to prevent high-quality region proposals from being filtered. Furthermore, a sampling strategy named Size-Aware Dynamic Sampling (SADS) is proposed to ensure sampling consistency between different stages. In addition, point-based representation is exploited to generate region proposals with stronger fitting ability. Without bells and whistles, SC-RPN achieves AR1000 14.5% higher than that of Region Proposal Network (RPN), surpassing all the existing region proposal approaches. We also integrate SC-RPN into Fast R-CNN and Faster R-CNN to test its effectiveness on object detection task, the experimental results achieve a gain of 3.2% and 3.8% in terms of mAP compared to the original ones.

5.
IEEE Trans Image Process ; 27(4): 1652-1664, 2018 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-29324418

RESUMO

Benefiting from multi-view video plus depth and depth-image-based-rendering technologies, only limited views of a real 3-D scene need to be captured, compressed, and transmitted. However, the quality assessment of synthesized views is very challenging, since some new types of distortions, which are inherently different from the texture coding errors, are inevitably produced by view synthesis and depth map compression, and the corresponding original views (reference views) are usually not available. Thus the full-reference quality metrics cannot be used for synthesized views. In this paper, we propose a novel no-reference image quality assessment method for 3-D synthesized views (called NIQSV+). This blind metric can evaluate the quality of synthesized views by measuring the typical synthesis distortions: blurry regions, black holes, and stretching, with access to neither the reference image nor the depth map. To evaluate the performance of the proposed method, we compare it with four full-reference 3-D (synthesized view dedicated) metrics, five full-reference 2-D metrics, and three no-reference 2-D metrics. In terms of their correlations with subjective scores, our experimental results show that the proposed no-reference metric approaches the best of the state-of-the-art full reference and no-reference 3-D metrics; and outperforms the widely used no-reference and full-reference 2-D metrics significantly. In terms of its approximation of human ranking, the proposed metric achieves the best performance in the experimental test.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...