Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Sensors (Basel) ; 24(12)2024 Jun 09.
Artículo en Inglés | MEDLINE | ID: mdl-38931530

RESUMEN

In this paper, we propose a lightweight U-net architecture neural network model based on Dark Channel Prior (DCP) for efficient haze (fog) removal with a single input. The existing DCP requires high computational complexity in its operation. These computations are challenging to accelerate, and the problem is exacerbated when dealing with high-resolution images (videos), making it very difficult to apply to general-purpose applications. Our proposed model addresses this issue by employing a two-stage neural network structure, replacing the computationally complex operations of the conventional DCP with easily accelerated convolution operations to achieve high-quality fog removal. Furthermore, our proposed model is designed with an intuitive structure using a relatively small number of parameters (2M), utilizing resources efficiently. These features demonstrate the effectiveness and efficiency of the proposed model for fog removal. The experimental results show that the proposed neural network model achieves an average Peak Signal-to-Noise Ratio (PSNR) of 26.65 dB and a Structural Similarity Index Measure (SSIM) of 0.88, indicating an improvement in the average PSNR of 11.5 dB and in SSIM of 0.22 compared to the conventional DCP. This shows that the proposed neural network achieves comparable results to CNN-based neural networks that have achieved SOTA-class performance, despite its intuitive structure with a relatively small number of parameters.

2.
Artículo en Inglés | MEDLINE | ID: mdl-37368809

RESUMEN

Although the research of arbitrary style transfer (AST) has achieved great progress in recent years, few studies pay special attention to the perceptual evaluation of AST images that are usually influenced by complicated factors, such as structure-preserving, style similarity, and overall vision (OV). Existing methods rely on elaborately designed hand-crafted features to obtain quality factors and apply a rough pooling strategy to evaluate the final quality. However, the importance weights between the factors and the final quality will lead to unsatisfactory performances by simple quality pooling. In this article, we propose a learnable network, named collaborative learning and style-adaptive pooling network (CLSAP-Net) to better address this issue. The CLSAP-Net contains three parts, i.e., content preservation estimation network (CPE-Net), style resemblance estimation network (SRE-Net), and OV target network (OVT-Net). Specifically, CPE-Net and SRE-Net use the self-attention mechanism and a joint regression strategy to generate reliable quality factors for fusion and weighting vectors for manipulating the importance weights. Then, grounded on the observation that style type can influence human judgment of the importance of different factors, our OVT-Net utilizes a novel style-adaptive pooling strategy guiding the importance weights of factors to collaboratively learn the final quality based on the trained CPE-Net and SRE-Net parameters. In our model, the quality pooling process can be conducted in a self-adaptive manner because the weights are generated after understanding the style type. The effectiveness and robustness of the proposed CLSAP-Net are well validated by extensive experiments on the existing AST image quality assessment (IQA) databases. Our code will be released at https://github.com/Hangwei-Chen/CLSAP-Net.

3.
IEEE Trans Vis Comput Graph ; 29(10): 4183-4197, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-35714091

RESUMEN

Light field (LF) imaging expands traditional imaging techniques by simultaneously capturing the intensity and direction information of light rays, and promotes many visual applications. However, owing to the inherent trade-off between the spatial and angular dimensions, LF images acquired by LF cameras usually suffer from low spatial resolution. Many current approaches increase the spatial resolution by exploring the four-dimensional (4D) structure of the LF images, but they have difficulties in recovering fine textures at a large upscaling factor. To address this challenge, this paper proposes a new deep learning-based LF spatial super-resolution method using heterogeneous imaging (LFSSR-HI). The designed heterogeneous imaging system uses an extra high-resolution (HR) traditional camera to capture the abundant spatial information in addition to the LF camera imaging, where the auxiliary information from the HR camera is utilized to super-resolve the LF image. Specifically, an LF feature alignment module is constructed to learn the correspondence between the 4D LF image and the 2D HR image to realize information alignment. Subsequently, a multi-level spatial-angular feature enhancement module is designed to gradually embed the aligned HR information into the rough LF features. Finally, the enhanced LF features are reconstructed into a super-resolved LF image using a simple feature decoder. To improve the flexibility of the proposed method, a pyramid reconstruction strategy is leveraged to generate multi-scale super-resolution results in one forward inference. The experimental results show that the proposed LFSSR-HI method achieves significant advantages over the state-of-the-art methods in both qualitative and quantitative comparisons. Furthermore, the proposed method preserves more accurate angular consistency.

4.
IEEE Trans Image Process ; 30: 402-417, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33186113

RESUMEN

Mismatches between the precisions of representing the disparity, depth value and rendering position in 3D video systems cause redundancies in depth map representations. In this paper, we propose a highly efficient multiview depth coding scheme based on Depth Histogram Projection (DHP) and Allowable Depth Distortion (ADD) in view synthesis. Firstly, DHP exploits the sparse representation of depth maps generated from stereo matching to reduce the residual error from INTER and INTRA predictions in depth coding. We provide a mathematical foundation for DHP-based lossless depth coding by theoretically analyzing its rate-distortion cost. Then, due to the mismatch between depth value and rendering position, there is a many-to-one mapping relationship between them in view synthesis, which induces the ADD model. Based on this ADD model and DHP, depth coding with lossless view synthesis quality is proposed to further improve the compression performance of depth coding while maintaining the same synthesized video quality. Experimental results reveal that the proposed DHP based depth coding can achieve an average bit rate saving of 20.66% to 19.52% for lossless coding on Multiview High Efficiency Video Coding (MV-HEVC) with different groups of pictures. In addition, our depth coding based on DHP and ADD achieves an average depth bit rate reduction of 46.69%, 34.12% and 28.68% for lossless view synthesis quality when the rendering precision varies from integer, half to quarter pixels, respectively. We obtain similar gains for lossless depth coding on the 3D-HEVC, HEVC Intra coding and JPEG2000 platforms.

5.
IEEE Trans Image Process ; 30: 641-656, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33186115

RESUMEN

High Efficiency Video Coding (HEVC) can significantly improve the compression efficiency in comparison with the preceding H.264/Advanced Video Coding (AVC) but at the cost of extremely high computational complexity. Hence, it is challenging to realize live video applications on low-delay and power-constrained devices, such as the smart mobile devices. In this article, we propose an online learning-based multi-stage complexity control method for live video coding. The proposed method consists of three stages: multi-accuracy Coding Unit (CU) decision, multi-stage complexity allocation, and Coding Tree Unit (CTU) level complexity control. Consequently, the encoding complexity can be accurately controlled to correspond with the computing capability of the video-capable device by replacing the traditional brute-force search with the proposed algorithm, which properly determines the optimal CU size. Specifically, the multi-accuracy CU decision model is obtained by an online learning approach to accommodate the different characteristics of input videos. In addition, multi-stage complexity allocation is implemented to reasonably allocate the complexity budgets to each coding level. In order to achieve a good trade-off between complexity control and rate distortion (RD) performance, the CTU-level complexity control is proposed to select the optimal accuracy of the CU decision model. The experimental results show that the proposed algorithm can accurately control the coding complexity from 100% to 40%. Furthermore, the proposed algorithm outperforms the state-of-the-art algorithms in terms of both accuracy of complexity control and RD performance.

6.
Artículo en Inglés | MEDLINE | ID: mdl-31369374

RESUMEN

The temporal flicker distortion is one of the most annoying noises in synthesized virtual view videos when they are rendered by compressed multi-view video plus depth in Three Dimensional (3D) video system. To assess the synthesized view video quality and further optimize the compression techniques in 3D video system, objective video quality assessment which can accurately measure the flicker distortion is highly needed. In this paper, we propose a full reference sparse representation based video quality assessment method towards synthesized 3D videos. Firstly, a synthesized video, treated as a 3D volume data with spatial (X-Y) and temporal (T) domains, is reformed and decomposed as a number of spatially neighboring temporal layers, i.e., X-T or Y-T planes. Gradient features in temporal layers of the synthesized video and strong edges of depth maps are used as key features in detecting the location of flicker distortions. Secondly, dictionary learning and sparse representation for the temporal layers are then derived and applied to effectively represent the temporal flicker distortion. Thirdly, a rank pooling method is used to pool all the temporal layer scores and obtain the score for the flicker distortion. Finally, the temporal flicker distortion measurement is combined with the conventional spatial distortion measurement to assess the quality of synthesized 3D videos. Experimental results on synthesized video quality database demonstrate our proposed method is significantly superior to other state-of-the-art methods, especially on the view synthesis distortions induced from depth videos.

7.
Opt Express ; 27(9): 13357-13371, 2019 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-31052861

RESUMEN

For temporal phase unwrapping in phase measuring profilometry, it has recently been reported that two phases with co-prime frequencies can be absolutely unwrapped using a look-up table; however, frequency selection and table construction has been performed manually without optimization. In this paper, a universal phase unwrapping method is proposed to unwrap phase flexibly and automatically by using geometric analysis, and thus we can programmatically build a one-dimensional or two-dimensional look-up table for arbitrary two co-prime frequencies to correctly unwrap phases in real time. Moreover, a phase error model related to the defocus effect is derived to figure out an optimal reference frequency co-prime to the principal frequency. Experimental results verify the correctness and computational efficiency of the proposed method.

8.
IEEE Trans Image Process ; 28(2): 561-576, 2019 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-30136946

RESUMEN

In this paper, we propose a Weighted Local sparse representation based Depth Image Super-Resolution (WLDISR) schemes aiming at improving the Virtual View Image (VVI) quality of 3D video system. Different from color images, depth images are mainly used to provide geometrical information in synthesizing VVI. Due to the view synthesis characteristics difference between textural structures and smooth regions of depth images, we divide the depth images into edge and smooth patches and learn two local dictionaries, respectively. Meanwhile, the weight term is derived and incorporated explicitly in the cost function to denote different importance of edge structures and smooth regions to the VVI quality. Then, local sparse representation and weighted sparse representation are jointly used in both dictionary learning and reconstruction phases in depth image super-resolution. Based on different optimizations on learning and reconstruction modules, three WLDISR schemes, WLDISR-D, WLDISR-R, and WLDISR-ALL, are proposed. Experimental results on 3D sequences demonstrate that the proposed WLDISR-D, WLDISR-R, and WLDISR-ALL schemes can achieve more than 1.9-, 2.03-, and 2.16-dB gains on average, respectively, in terms of the VVIs' quality, as compared with the state-of-the-art schemes. In addition, the visual quality of VVIs is also improved.

9.
IEEE Trans Image Process ; 28(4): 1866-1881, 2019 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-30452360

RESUMEN

A challenging problem in the no-reference quality assessment of multiply distorted stereoscopic images (MDSIs) is to simulate the monocular and binocular visual properties under a mixed type of distortions. Due to the joint effects of multiple distortions in MDSIs, the underlying monocular and binocular visual mechanisms have different manifestations with those of singly distorted stereoscopic images (SDSIs). This paper presents a unified no-reference quality evaluator for SDSIs and MDSIs by learning monocular and binocular local visual primitives (MB-LVPs). The main idea is to learn MB-LVPs to characterize the local receptive field properties of the visual cortex in response to SDSIs and MDSIs. Furthermore, we also consider that the learning of primitives should be performed in a task-driven manner. For this, two penalty terms including reconstruction error and quality inconsistency are jointly minimized within a supervised dictionary learning framework, generating a set of quality-oriented MB-LVPs for each single and multiple distortion modality. Given an input stereoscopic image, feature encoding is performed using the learned MB-LVPs as codebooks, resulting in the corresponding monocular and binocular responses. Finally, responses across all the modalities are fused with probabilistic weights which are determined by the modality-specific sparse reconstruction errors, yielding the final monocular and binocular features for quality regression. The superiority of our method has been verified on several SDSI and MDSI databases.

10.
ScientificWorldJournal ; 2014: 136854, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24737956

RESUMEN

To compress stereo video effectively, this paper proposes a novel macroblock (MB) level rate control method based on binocular perception. A binocular just-notification difference (BJND) model based on the parallax matching is first used to describe binocular perception. Then, the proposed rate control method is performed in stereo video coding with four levels, namely, view level, group-of-pictures (GOP) level, frame level, and MB level. In the view level, different proportions of bitrates are allocated for the left and right views of stereo video according to the prestatistical rate allocation proportion. In the GOP level, the total number of bitrates allocated to each GOP is computed and the initial quantization parameter of each GOP is set. In the frame level, the target bits allocated to each frame are computed. In the MB level, visual perception factor, which is measured by the BJND value of MB, is used to adjust the MB level bit allocation, so that the rate control results in line with the human visual characteristics. Experimental results show that the proposed method can control the bitrate more accurately and get better subjective quality of stereo video, compared with other methods.


Asunto(s)
Algoritmos , Biomimética/métodos , Compresión de Datos/métodos , Imagenología Tridimensional/métodos , Procesamiento de Señales Asistido por Computador , Grabación en Video/métodos , Percepción Visual , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...