Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Appl Opt ; 62(34): 9057-9065, 2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-38108742

RESUMO

To improve the accuracy of saliency detection in challenging scenes such as small objects, multiple objects, and blur, we propose a light field saliency detection method via two-way focal stack fusion. The first way extracts latent depth features by calculating the transmittance of the focal stack to avoid the interference of out-of-focus regions. The second way analyzes the focused distribution and calculates the background probability of the slice, which can distinguish the foreground from the background. Extracting the potential cues of the focal stack through the two different ways can improve saliency detection in complex scenes. Finally, a multi-layer cellular automaton optimizer is utilized to incorporate compactness, focus, center prior, and depth features to obtain the final salient result. Comparison and ablation experiments are performed to verify the effectiveness of the proposed method. Experimental results prove that the proposed method demonstrates effectiveness in challenging scenarios and outperforms the state-of-the-art methods. They also verify that the depth and focus cues of the focal stack can enhance the performance of previous methods.

2.
Entropy (Basel) ; 25(9)2023 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-37761635

RESUMO

An abundance of features in the light field has been demonstrated to be useful for saliency detection in complex scenes. However, bottom-up saliency detection models are limited in their ability to explore light field features. In this paper, we propose a light field saliency detection method that focuses on depth-induced saliency, which can more deeply explore the interactions between different cues. First, we localize a rough saliency region based on the compactness of color and depth. Then, the relationships among depth, focus, and salient objects are carefully investigated, and the focus cue of the focal stack is used to highlight the foreground objects. Meanwhile, the depth cue is utilized to refine the coarse salient objects. Furthermore, considering the consistency of color smoothing and depth space, an optimization model referred to as color and depth-induced cellular automata is improved to increase the accuracy of saliency maps. Finally, to avoid interference of redundant information, the mean absolute error is chosen as the indicator of the filter to obtain the best results. The experimental results on three public light field datasets show that the proposed method performs favorably against the state-of-the-art conventional light field saliency detection approaches and even light field saliency detection approaches based on deep learning.

3.
Entropy (Basel) ; 23(6)2021 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-34207229

RESUMO

Multiview video plus depth is one of the mainstream representations of 3D scenes in emerging free viewpoint video, which generates virtual 3D synthesized images through a depth-image-based-rendering (DIBR) technique. However, the inaccuracy of depth maps and imperfect DIBR techniques result in different geometric distortions that seriously deteriorate the users' visual perception. An effective 3D synthesized image quality assessment (IQA) metric can simulate human visual perception and determine the application feasibility of the synthesized content. In this paper, a no-reference IQA metric based on visual-entropy-guided multi-layer features analysis for 3D synthesized images is proposed. According to the energy entropy, the geometric distortions are divided into two visual attention layers, namely, bottom-up layer and top-down layer. The feature of salient distortion is measured by regional proportion plus transition threshold on a bottom-up layer. In parallel, the key distribution regions of insignificant geometric distortion are extracted by a relative total variation model, and the features of these distortions are measured by the interaction of decentralized attention and concentrated attention on top-down layers. By integrating the features of both bottom-up and top-down layers, a more visually perceptive quality evaluation model is built. Experimental results show that the proposed method is superior to the state-of-the-art in assessing the quality of 3D synthesized images.

4.
Entropy (Basel) ; 22(2)2020 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-33285965

RESUMO

With the wide applications of three-dimensional (3D) meshes in intelligent manufacturing, digital animation, virtual reality, digital cities and other fields, more and more processing techniques are being developed for 3D meshes, including watermarking, compression, and simplification, which will inevitably lead to various distortions. Therefore, how to evaluate the visual quality of 3D mesh is becoming an important problem and it is necessary to design effective tools for blind 3D mesh quality assessment. In this paper, we propose a new Blind Mesh Quality Assessment method based on Graph Spectral Entropy and Spatial features, called as BMQA-GSES. 3D mesh can be represented as graph signal, in the graph spectral domain, the Gaussian curvature signal of the 3D mesh is firstly converted with Graph Fourier transform (GFT), and then the smoothness and information entropy of amplitude features are extracted to evaluate the distortion. In the spatial domain, four well-performing spatial features are combined to describe the concave and convex information and structural information of 3D meshes. All the extracted features are fused by the random forest regression to predict the objective quality score of the 3D mesh. Experiments are performed successfully on the public databases and the obtained results show that the proposed BMQA-GSES method provides good correlation with human visual perception and competitive scores compared to state-of-art quality assessment methods.

5.
Entropy (Basel) ; 22(8)2020 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-33286621

RESUMO

High dynamic range (HDR) images give a strong disposition to capture all parts of natural scene information due to their wider brightness range than traditional low dynamic range (LDR) images. However, to visualize HDR images on common LDR displays, tone mapping operations (TMOs) are extra required, which inevitably lead to visual quality degradation, especially in the bright and dark regions. To evaluate the performance of different TMOs accurately, this paper proposes a blind tone-mapped image quality assessment method based on regional sparse response and aesthetics (RSRA-BTMI) by considering the influences of detail information and color on the human visual system. Specifically, for the detail loss in a tone-mapped image (TMI), multi-dictionaries are first designed for different brightness regions and whole TMI. Then regional sparse atoms aggregated by local entropy and global reconstruction residuals are presented to characterize the regional and global detail distortion in TMI, respectively. Besides, a few efficient aesthetic features are extracted to measure the color unnaturalness of TMI. Finally, all extracted features are linked with relevant subjective scores to conduct quality regression via random forest. Experimental results on the ESPL-LIVE HDR database demonstrate that the proposed RSRA-BTMI method is superior to the existing state-of-the-art blind TMI quality assessment methods.

6.
Appl Opt ; 57(4): 839-848, 2018 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-29400748

RESUMO

The practical applications of the full-reference image quality assessment (IQA) method are limited. Here, we propose a new no-reference quality assessment method for high-dynamic-range (HDR) images. First, tensor decomposition is used to generate three feature maps of an HDR image, considering color and structure information of the HDR image. Second, for a given HDR image, because its first feature map contains its main energy and important structural feature information, manifold learning is used in the first feature map to find the inherent geometric structure of high-dimensional data in a low-dimensional manifold. In addition, the corresponding multi-scale manifold structure features are extracted from the first feature map. For the second and third feature maps of the HDR image, multi-scale contrast features are extracted, as they reflect the perceived detail contrast information of the HDR image. Finally, the extracted features are aggregated by support vector regression to obtain the objective quality prediction score of the HDR image. Experimental results show that the proposed method is superior to some representative full- and no-reference methods, and even superior to the full-reference HDR IQA method, HDR-VDP-2.2, on the Nantes database. The proposed method has a higher consistency with human visual perception.

7.
Appl Opt ; 56(30): 8547-8554, 2017 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-29091638

RESUMO

Quality prediction of virtual-views is important for free viewpoint video systems, and can be used as feedback to improve the performance of depth video coding and virtual-view rendering. In this paper, an efficient virtual-view peak signal to noise ratio (PSNR) prediction method is proposed. First, the effect of depth distortion on virtual-view quality is analyzed in detail, and a depth distortion tolerance (DDT) model that determines the DDT range is presented. Next, the DDT model is used to predict the virtual-view quality. Finally, a support vector machine (SVM) is utilized to train and obtain the virtual-view quality prediction model. Experimental results show that the Spearman's rank correlation coefficient and root mean square error between the actual PSNR and the predicted PSNR by DDT model are 0.8750 and 0.6137 on average, and by the SVM prediction model are 0.9109 and 0.5831. The computational complexity of the SVM method is lower than the DDT model and the state-of-the-art methods.

8.
Appl Opt ; 55(35): 10084-10091, 2016 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-27958425

RESUMO

High dynamic range (HDR) images can only be backward-compatible with existing low dynamic range (LDR) imaging systems after being processed by tone-mapping operators. Hence, the quality assessment (QA) of tone-mapped HDR images has become an important and challenging issue in HDR imaging research. In this paper, we propose a naturalness index for a tone-mapped image to predict its quality. First, we extract the statistical features of the tone-mapped image's luminance value and use it to evaluate the brightness naturalness with no reference information. Meanwhile, we use perceptive color, image contrast, and detail information to represent the image content and predict their naturalness qualities, respectively. Then, the four components of the naturalness qualities are combined to yield the overall naturalness quality of the tone-mapped image. Experimental results on a publicly available database demonstrated that, in comparison with a traditional LDR image QA method and a leading tone-mapped image QA method, the proposed method has better performance in evaluating a tone-mapped image's quality.

9.
ScientificWorldJournal ; 2014: 136854, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24737956

RESUMO

To compress stereo video effectively, this paper proposes a novel macroblock (MB) level rate control method based on binocular perception. A binocular just-notification difference (BJND) model based on the parallax matching is first used to describe binocular perception. Then, the proposed rate control method is performed in stereo video coding with four levels, namely, view level, group-of-pictures (GOP) level, frame level, and MB level. In the view level, different proportions of bitrates are allocated for the left and right views of stereo video according to the prestatistical rate allocation proportion. In the GOP level, the total number of bitrates allocated to each GOP is computed and the initial quantization parameter of each GOP is set. In the frame level, the target bits allocated to each frame are computed. In the MB level, visual perception factor, which is measured by the BJND value of MB, is used to adjust the MB level bit allocation, so that the rate control results in line with the human visual characteristics. Experimental results show that the proposed method can control the bitrate more accurately and get better subjective quality of stereo video, compared with other methods.


Assuntos
Algoritmos , Biomimética/métodos , Compressão de Dados/métodos , Imageamento Tridimensional/métodos , Processamento de Sinais Assistido por Computador , Gravação em Vídeo/métodos , Percepção Visual , Humanos
10.
Neural Netw ; 173: 106156, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38340468

RESUMO

Multispectral object detection (MOD), which incorporates additional information from thermal images into object detection (OD) to robustly cope with complex illumination conditions, has garnered significant attention. However, existing MOD methods always demand a considerable amount of annotated data for training. Inspired by the concept of few-shot learning, we propose a novel task called few-shot multispectral object detection (FSMOD) that aims to accomplish MOD using only a few annotated data from each category. Specifically, we first design a cross-modality interaction (CMI) module, which leverages different attention mechanisms to interact with the information from visible and thermal modalities during backbone feature extraction. With the guidance of interaction process, the detector is able to extract modality-specific backbone features with better discrimination. To improve the few-shot learning ability of the detector, we also design a semantic prototype metric (SPM) loss that integrates semantic knowledge, i.e., word embeddings, into the optimization process of embedding space. Semantic knowledge provides stable category representation when visual information is insufficient. Extensive experiments on the customized FSMOD dataset demonstrate that the proposed method achieves state-of-the-art performance.


Assuntos
Inteligência , Semântica , Conhecimento , Aprendizagem , Iluminação
11.
IEEE Trans Image Process ; 30: 2364-2377, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33481711

RESUMO

image can be represented with different formats, such as the equirectangular projection (ERP) image, viewport images or spherical image, for its different processing procedures and applications. Accordingly, the 360-degree image quality assessment (360-IQA) can be performed on these different formats. However, the performance of 360-IQA with the ERP image is not equivalent with those with the viewport images or spherical image due to the over-sampling and the resulted obvious geometric distortion of ERP image. This imbalance problem brings challenge to ERP image based applications, such as 360-degree image/video compression and assessment. In this paper, we propose a new blind 360-IQA framework to handle this imbalance problem. In the proposed framework, cubemap projection (CMP) with six inter-related faces is used to realize the omnidirectional viewing of 360-degree image. A multi-distortions visual attention quality dataset for 360-degree images is firstly established as the benchmark to analyze the performance of objective 360-IQA methods. Then, the perception-driven blind 360-IQA framework is proposed based on six cubemap faces of CMP for 360-degree image, in which human attention behavior is taken into account to improve the effectiveness of the proposed framework. The cubemap quality feature subset of CMP image is first obtained, and additionally, attention feature matrices and subsets are also calculated to describe the human visual behavior. Experimental results show that the proposed framework achieves superior performances compared with state-of-the-art IQA methods, and the cross dataset validation also verifies the effectiveness of the proposed framework. In addition, the proposed framework can also be combined with new quality feature extraction method to further improve the performance of 360-IQA. All of these demonstrate that the proposed framework is effective in 360-IQA and has a good potential for future applications.

12.
IEEE Trans Image Process ; 30: 641-656, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33186115

RESUMO

High Efficiency Video Coding (HEVC) can significantly improve the compression efficiency in comparison with the preceding H.264/Advanced Video Coding (AVC) but at the cost of extremely high computational complexity. Hence, it is challenging to realize live video applications on low-delay and power-constrained devices, such as the smart mobile devices. In this article, we propose an online learning-based multi-stage complexity control method for live video coding. The proposed method consists of three stages: multi-accuracy Coding Unit (CU) decision, multi-stage complexity allocation, and Coding Tree Unit (CTU) level complexity control. Consequently, the encoding complexity can be accurately controlled to correspond with the computing capability of the video-capable device by replacing the traditional brute-force search with the proposed algorithm, which properly determines the optimal CU size. Specifically, the multi-accuracy CU decision model is obtained by an online learning approach to accommodate the different characteristics of input videos. In addition, multi-stage complexity allocation is implemented to reasonably allocate the complexity budgets to each coding level. In order to achieve a good trade-off between complexity control and rate distortion (RD) performance, the CTU-level complexity control is proposed to select the optimal accuracy of the CU decision model. The experimental results show that the proposed algorithm can accurately control the coding complexity from 100% to 40%. Furthermore, the proposed algorithm outperforms the state-of-the-art algorithms in terms of both accuracy of complexity control and RD performance.

13.
PLoS One ; 12(4): e0175798, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28445489

RESUMO

Well-performed Video quality assessment (VQA) method should be consistent with human visual systems for better prediction accuracy. In this paper, we propose a VQA method using motion-compensated temporal filtering (MCTF) and manifold feature similarity. To be more specific, a group of frames (GoF) is first decomposed into a temporal high-pass component (HPC) and a temporal low-pass component (LPC) by MCTF. Following this, manifold feature learning (MFL) and phase congruency (PC) are used to predict the quality of temporal LPC and temporal HPC respectively. The quality measures of the LPC and the HPC are then combined as GoF quality. A temporal pooling strategy is subsequently used to integrate GoF qualities into an overall video quality. The proposed VQA method appropriately processes temporal information in video by MCTF and temporal pooling strategy, and simulate human visual perception by MFL. Experiments on publicly available video quality database showed that in comparison with several state-of-the-art VQA methods, the proposed VQA method achieves better consistency with subjective video quality and can predict video quality more accurately.


Assuntos
Aumento da Imagem/métodos , Gravação em Vídeo , Algoritmos , Bases de Dados Factuais , Humanos , Movimento (Física) , Percepção Visual
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa