Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38625773

RESUMO

Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by comparing our model generalization capabilities on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.

2.
IEEE Trans Image Process ; 33: 1965-1976, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38451766

RESUMO

The Geometry-based Point Cloud Compression (G-PCC) has been developed by the Moving Picture Experts Group to compress point clouds efficiently. Nevertheless, in its lossy mode, the reconstructed point cloud by G-PCC often suffers from noticeable distortions due to naïve geometry quantization (i.e., grid downsampling). This paper proposes a hierarchical prior-based super resolution method for point cloud geometry compression. The content-dependent hierarchical prior is constructed at the encoder side, which enables coarse-to-fine super resolution of the point cloud geometry at the decoder side. A more accurate prior generally yields improved reconstruction performance, albeit at the cost of increased bits required to encode this piece of side information. Our experiments on the MPEG Cat1A dataset demonstrate substantial Bjøntegaard-delta bitrate savings, surpassing the performance of the octree-based and trisoup-based G-PCC v14. We provide our implementations for reproducible research at https://github.com/lidq92/mpeg-pcc-tmc13.

3.
IEEE Trans Image Process ; 33: 1898-1910, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38451761

RESUMO

In this paper, we present a simple yet effective continual learning method for blind image quality assessment (BIQA) with improved quality prediction accuracy, plasticity-stability trade-off, and task-order/-length robustness. The key step in our approach is to freeze all convolution filters of a pre-trained deep neural network (DNN) for an explicit promise of stability, and learn task-specific normalization parameters for plasticity. We assign each new IQA dataset (i.e., task) a prediction head, and load the corresponding normalization parameters to produce a quality score. The final quality estimate is computed by a weighted summation of predictions from all heads with a lightweight K -means gating mechanism. Extensive experiments on six IQA datasets demonstrate the advantages of the proposed method in comparison to previous training techniques for BIQA.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 10114-10128, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37030806

RESUMO

Measuring perceptual color differences (CDs) is of great importance in modern smartphone photography. Despite the long history, most CD measures have been constrained by psychophysical data of homogeneous color patches or a limited number of simplistic natural photographic images. It is thus questionable whether existing CD measures generalize in the age of smartphone photography characterized by greater content complexities and learning-based image signal processors. In this article, we put together so far the largest image dataset for perceptual CD assessment, in which the photographic images are 1) captured by six flagship smartphones, 2) altered by Photoshop, 3) post-processed by built-in filters of the smartphones, and 4) reproduced with incorrect color profiles. We then conduct a large-scale psychophysical experiment to gather perceptual CDs of 30,000 image pairs in a carefully controlled laboratory environment. Based on the newly established dataset, we make one of the first attempts to construct an end-to-end learnable CD formula based on a lightweight neural network, as a generalization of several previous metrics. Extensive experiments demonstrate that the optimized formula outperforms 33 existing CD measures by a large margin, offers reasonable local CD maps without the use of dense supervision, generalizes well to homogeneous color patch data, and empirically behaves as a proper metric in the mathematical sense. Our dataset and code are publicly available at https://github.com/hellooks/CDNet.


Assuntos
Algoritmos , Smartphone , Fotografação/métodos , Redes Neurais de Computação , Aprendizagem , Cor
5.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 2864-2878, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35635807

RESUMO

The explosive growth of image data facilitates the fast development of image processing and computer vision methods for emerging visual applications, meanwhile introducing novel distortions to processed images. This poses a grand challenge to existing blind image quality assessment (BIQA) models, which are weak at adapting to subpopulation shift. Recent work suggests training BIQA methods on the combination of all available human-rated IQA datasets. However, this type of approach is not scalable to a large number of datasets and is cumbersome to incorporate a newly created dataset as well. In this paper, we formulate continual learning for BIQA, where a model learns continually from a stream of IQA datasets, building on what was learned from previously seen data. We first identify five desiderata in the continual setting with three criteria to quantify the prediction accuracy, plasticity, and stability, respectively. We then propose a simple yet effective continual learning method for BIQA. Specifically, based on a shared backbone network, we add a prediction head for a new dataset and enforce a regularizer to allow all prediction heads to evolve with new data while being resistant to catastrophic forgetting of old data. We compute the overall quality score by a weighted summation of predictions from all heads. Extensive experiments demonstrate the promise of the proposed continual learning method in comparison to standard training techniques for BIQA, with and without experience replay. We made the code publicly available at https://github.com/zwx8981/BIQA_CL.

6.
IEEE Trans Vis Comput Graph ; 28(8): 3022-3034, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-33434131

RESUMO

Omnidirectional images (also referred to as static 360 ° panoramas) impose viewing conditions much different from those of regular 2D images. How do humans perceive image distortions in immersive virtual reality (VR) environments is an important problem which receives less attention. We argue that, apart from the distorted panorama itself, two types of VR viewing conditions are crucial in determining the viewing behaviors of users and the perceived quality of the panorama: the starting point and the exploration time. We first carry out a psychophysical experiment to investigate the interplay among the VR viewing conditions, the user viewing behaviors, and the perceived quality of 360 ° images. Then, we provide a thorough analysis of the collected human data, leading to several interesting findings. Moreover, we propose a computational framework for objective quality assessment of 360 ° images, embodying viewing conditions and behaviors in a delightful way. Specifically, we first transform an omnidirectional image to several video representations using different user viewing behaviors under different viewing conditions. We then leverage advanced 2D full-reference video quality models to compute the perceived quality. We construct a set of specific quality measures within the proposed framework, and demonstrate their promises on three VR quality databases.


Assuntos
Gráficos por Computador , Realidade Virtual , Atenção , Bases de Dados Factuais , Humanos
7.
IEEE Trans Pattern Anal Mach Intell ; 44(5): 2567-2581, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-33338012

RESUMO

Objective measures of image quality generally operate by comparing pixels of a "degraded" image to those of the original. Relative to human observers, these measures are overly sensitive to resampling of texture regions (e.g., replacing one patch of grass with another). Here, we develop the first full-reference image quality model with explicit tolerance to texture resampling. Using a convolutional neural network, we construct an injective and differentiable function that transforms images to multi-scale overcomplete representations. We demonstrate empirically that the spatial averages of the feature maps in this representation capture texture appearance, in that they provide a set of sufficient statistical constraints to synthesize a wide variety of texture patterns. We then describe an image quality method that combines correlations of these spatial averages ("texture similarity") with correlations of the feature maps ("structure similarity"). The parameters of the proposed measure are jointly optimized to match human ratings of image quality, while minimizing the reported distances between subimages cropped from the same texture images. Experiments show that the optimized method explains human perceptual scores, both on conventional image quality databases, as well as on texture databases. The measure also offers competitive performance on related tasks such as texture classification and retrieval. Finally, we show that our method is relatively insensitive to geometric transformations (e.g., translation and dilation), without use of any specialized training or data augmentation. Code is available at https://github.com/dingkeyan93/DISTS.


Assuntos
Algoritmos , Redes Neurais de Computação , Humanos
8.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 4577-4590, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-33830918

RESUMO

The research in image quality assessment (IQA) has a long history, and significant progress has been made by leveraging recent advances in deep neural networks (DNNs). Despite high correlation numbers on existing IQA datasets, DNN-based models may be easily falsified in the group maximum differentiation (gMAD) competition. Here we show that gMAD examples can be used to improve blind IQA (BIQA) methods. Specifically, we first pre-train a DNN-based BIQA model using multiple noisy annotators, and fine-tune it on multiple synthetically distorted images, resulting in a "top-performing" baseline model. We then seek pairs of images by comparing the baseline model with a set of full-reference IQA methods in gMAD. The spotted gMAD examples are most likely to reveal the weaknesses of the baseline, and suggest potential ways for refinement. We query human quality annotations for the selected images in a well-controlled laboratory environment, and further fine-tune the baseline on the combination of human-rated images from gMAD and existing databases. This process may be iterated, enabling active fine-tuning from gMAD examples for BIQA. We demonstrate the feasibility of our active learning scheme on a large-scale unlabeled image set, and show that the fine-tuned quality model achieves improved generalizability in gMAD, without destroying performance on previously seen databases.


Assuntos
Algoritmos , Redes Neurais de Computação , Bases de Dados Factuais , Humanos
9.
IEEE Trans Image Process ; 30: 3474-3486, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33661733

RESUMO

Performance of blind image quality assessment (BIQA) models has been significantly boosted by end-to-end optimization of feature engineering and quality regression. Nevertheless, due to the distributional shift between images simulated in the laboratory and captured in the wild, models trained on databases with synthetic distortions remain particularly weak at handling realistic distortions (and vice versa). To confront the cross-distortion-scenario challenge, we develop a unified BIQA model and an approach of training it for both synthetic and realistic distortions. We first sample pairs of images from individual IQA databases, and compute a probability that the first image of each pair is of higher quality. We then employ the fidelity loss to optimize a deep neural network for BIQA over a large number of such image pairs. We also explicitly enforce a hinge constraint to regularize uncertainty estimation during optimization. Extensive experiments on six IQA databases show the promise of the learned method in blindly assessing image quality in the laboratory and wild. In addition, we demonstrate the universality of the proposed training strategy by using it to improve existing BIQA models.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação , Bases de Dados Factuais , Humanos , Laboratórios
10.
Int J Comput Vis ; 129(4): 1258-1281, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33495671

RESUMO

The performance of objective image quality assessment (IQA) models has been evaluated primarily by comparing model predictions to human quality judgments. Perceptual datasets gathered for this purpose have provided useful benchmarks for improving IQA methods, but their heavy use creates a risk of overfitting. Here, we perform a large-scale comparison of IQA models in terms of their use as objectives for the optimization of image processing algorithms. Specifically, we use eleven full-reference IQA models to train deep neural networks for four low-level vision tasks: denoising, deblurring, super-resolution, and compression. Subjective testing on the optimized images allows us to rank the competing models in terms of their perceptual performance, elucidate their relative advantages and disadvantages in these tasks, and propose a set of desirable properties for incorporation into future IQA models.

11.
Artigo em Inglês | MEDLINE | ID: mdl-32356747

RESUMO

Rate-distortion (RD) theory is at the heart of lossy data compression. Here we aim to model the generalized RD (GRD) trade-off between the visual quality of a compressed video and its encoding profiles (e.g., bitrate and spatial resolution). We first define the theoretical functional space W of the GRD function by analyzing its mathematical properties. We show that W is a convex set in a Hilbert space, inspiring a computational model of the GRD function, and a method of estimating model parameters from sparse measurements. To demonstrate the feasibility of our idea, we collect a large-scale database of real-world GRD functions, which turn out to live in a low-dimensional subspace of W. Combining the GRD reconstruction framework and the learned low-dimensional space, we create a low-parameter eigen GRD method to accurately estimate the GRD function of a source video content from only a few queries. Experimental results on the database show that the learned GRD method significantly outperforms state-of-the-art empirical RD estimation methods both in accuracy and efficiency. Last, we demonstrate the promise of the proposed model in video codec comparison.

12.
Artigo em Inglês | MEDLINE | ID: mdl-32310768

RESUMO

Exposure bracketing is crucial to high dynamic range imaging, but it is prone to halos for static scenes and ghosting artifacts for dynamic scenes. The recently proposed structural patch decomposition for multi-exposure fusion (SPD-MEF) has achieved reliable performance in deghosting, but suffers from visible halo artifacts and is computationally expensive. In addition, its relationship to other MEF methods is unclear. We show that without explicitly performing structural patch decomposition, we arrive at an unnormalized version of SPD-MEF, which enjoys an order of 30× speed-up, and is closely related to pixel-level MEF methods as well as the standard two-layer decomposition method for MEF. Moreover, we develop a fast multi-scale SPD-MEF method, which can effectively reduce halo artifacts. Experimental results demonstrate the effectiveness of the proposed MEF method in terms of speed and quality.

13.
Artigo em Inglês | MEDLINE | ID: mdl-32305914

RESUMO

Precise estimation of the probabilistic structure of natural images plays an essential role in image compression. Despite the recent remarkable success of end-to-end optimized image compression, the latent codes are usually assumed to be fully statistically factorized in order to simplify entropy modeling. However, this assumption generally does not hold true and may hinder compression performance. Here we present contextbased convolutional networks (CCNs) for efficient and effective entropy modeling. In particular, a 3D zigzag scanning order and a 3D code dividing technique are introduced to define proper coding contexts for parallel entropy decoding, both of which boil down to place translation-invariant binary masks on convolution filters of CCNs. We demonstrate the promise of CCNs for entropy modeling in both lossless and lossy image compression. For the former, we directly apply a CCN to the binarized representation of an image to compute the Bernoulli distribution of each code for entropy estimation. For the latter, the categorical distribution of each code is represented by a discretized mixture of Gaussian distributions, whose parameters are estimated by three CCNs. We then jointly optimize the CCNbased entropy model along with analysis and synthesis transforms for rate-distortion performance. Experiments on the Kodak and Tecnick datasets show that our methods powered by the proposed CCNs generally achieve comparable compression performance to the state-of-the-art while being much faster.

14.
IEEE Trans Pattern Anal Mach Intell ; 42(4): 851-864, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-30596570

RESUMO

In many science and engineering fields that require computational models to predict certain physical quantities, we are often faced with the selection of the best model under the constraint that only a small sample set can be physically measured. One such example is the prediction of human perception of visual quality, where sample images live in a high dimensional space with enormous content variations. We propose a new methodology for model comparison named group maximum differentiation (gMAD) competition. Given multiple computational models, gMAD maximizes the chances of falsifying a "defender" model using the rest models as "attackers". It exploits the sample space to find sample pairs that maximally differentiate the attackers while holding the defender fixed. Based on the results of the attacking-defending game, we introduce two measures, aggressiveness and resistance, to summarize the performance of each model at attacking other models and defending attacks from other models, respectively. We demonstrate the gMAD competition using three examples-image quality, image aesthetics, and streaming video quality-of-experience. Although these examples focus on visually discriminable quantities, the gMAD methodology can be extended to many other fields, and is especially useful when the sample space is large, the physical measurement is expensive and the cost of computational prediction is low.

15.
Artigo em Inglês | MEDLINE | ID: mdl-31751238

RESUMO

We propose a fast multi-exposure image fusion (MEF) method, namely MEF-Net, for static image sequences of arbitrary spatial resolution and exposure number. We first feed a low-resolution version of the input sequence to a fully convolutional network for weight map prediction. We then jointly upsample the weight maps using a guided filter. The final image is computed by a weighted fusion. Unlike conventional MEF methods, MEF-Net is trained end-to-end by optimizing the perceptually calibrated MEF structural similarity (MEF-SSIM) index over a database of training sequences at full resolution. Across an independent set of test sequences, we find that the optimized MEF-Net achieves consistent improvement in visual quality for most sequences, and runs 10 to 1000 times faster than state-of-the-art methods. The code is made publicly available at.

16.
Artigo em Inglês | MEDLINE | ID: mdl-31535996

RESUMO

A common approach to high dynamic range (HDR) imaging is to capture multiple images of different exposures followed by multi-exposure image fusion (MEF) in either radiance or intensity domain. A predominant problem of this approach is the introduction of the ghosting artifacts in dynamic scenes with camera and object motion. While many MEF methods (often referred to as deghosting algorithms) have been proposed for reduced ghosting artifacts and improved visual quality, little work has been dedicated to perceptual evaluation of their deghosting results. Here we first construct a database that contains 20 multiexposure sequences of dynamic scenes and their corresponding fused images by nine MEF algorithms. We then carry out a subjective experiment to evaluate fused image quality, and find that none of existing objective quality models for MEF provides accurate quality predictions. Motivated by this, we develop an objective quality model for MEF of dynamic scenes. Specifically, we divide the test image into static and dynamic regions, measure structural similarity between the image and the corresponding sequence in the two regions separately, and combine quality measurements of the two regions into an overall quality score. Experimental results show that the proposed method significantly outperforms the state-of-the-art. In addition, we demonstrate the promise of the proposed model in parameter tuning of MEF methods.1.

17.
Artigo em Inglês | MEDLINE | ID: mdl-29994659

RESUMO

The human visual system excels at detecting local blur of visual images, but the underlying mechanism is not well understood. Traditional views of blur such as reduction in energy at high frequencies and loss of phase coherence at localized features have fundamental limitations. For example, they cannot well discriminate flat regions from blurred ones. Here we propose that high-level semantic information is critical in successfully identifying local blur. Therefore, we resort to deep neural networks that are proficient at learning high-level features and propose the first end-to-end local blur mapping algorithm based on a fully convolutional network. By analyzing various architectures with different depths and design philosophies, we empirically show that high-level features of deeper layers play a more important role than low-level features of shallower layers in resolving challenging ambiguities for this task. We test the proposed method on a standard blur detection benchmark and demonstrate that it significantly advances the state-of-the-art (ODS F-score of 0.853). Furthermore, we explore the use of the generated blur maps in three applications, including blur region segmentation, blur degree estimation, and blur magnification.

18.
Artigo em Inglês | MEDLINE | ID: mdl-30010561

RESUMO

The dynamic adaptive streaming over HTTP (DASH) provides an inter-operable solution to overcome volatile network conditions, but how the human visual quality-ofexperience (QoE) changes with time-varying video quality is not well-understood. Here, we build a large-scale video database of time-varying quality and design a series of subjective experiments to investigate how humans respond to compression level, spatial and temporal resolution adaptations. Our path-analytic results show that quality adaptations influence the QoE by modifying the perceived quality of subsequent video segments. Specifically, the quality deviation introduced by quality adaptations is asymmetric with respect to the adaptation direction, which is further influenced by other factors such as compression level and content. Furthermore, we propose an objective QoE model by integrating the empirical findings from our subjective experiments and the expectation confirmation theory (ECT). Experimental results show that the proposed ECT-QoE model is in close agreement with subjective opinions and significantly outperforms existing QoE models. The video database together with the code are available online at https://ece.uwaterloo.ca/~zduanmu/tip2018ectqoe/.

19.
IEEE Trans Image Process ; 27(3): 1202-1213, 2018 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-29220321

RESUMO

We propose a multi-task end-to-end optimized deep neural network (MEON) for blind image quality assessment (BIQA). MEON consists of two sub-networks-a distortion identification network and a quality prediction network-sharing the early layers. Unlike traditional methods used for training multi-task networks, our training process is performed in two steps. In the first step, we train a distortion type identification sub-network, for which large-scale training samples are readily available. In the second step, starting from the pre-trained early layers and the outputs of the first sub-network, we train a quality prediction sub-network using a variant of the stochastic gradient descent method. Different from most deep neural networks, we choose biologically inspired generalized divisive normalization (GDN) instead of rectified linear unit as the activation function. We empirically demonstrate that GDN is effective at reducing model parameters/layers while achieving similar quality prediction performance. With modest model complexity, the proposed MEON index achieves state-of-the-art performance on four publicly available benchmarks. Moreover, we demonstrate the strong competitiveness of MEON against state-of-the-art BIQA models using the group maximum differentiation competition methodology.

20.
IEEE Trans Image Process ; 26(3): 1202-1215, 2017 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-28026766

RESUMO

Subjective and objective measurement of the perceptual quality of depth information in symmetrically and asymmetrically distorted stereoscopic images is a fundamentally important issue in stereoscopic 3D imaging that has not been deeply investigated. Here, we first carry out a subjective test following the traditional absolute category rating protocol widely used in general image quality assessment research. We find this approach problematic, because monocular cues and the spatial quality of images have strong impact on the depth quality scores given by subjects, making it difficult to single out the actual contributions of stereoscopic cues in depth perception. To overcome this problem, we carry out a novel subjective study where depth effect is synthesized at different depth levels before various types and levels of symmetric and asymmetric distortions are applied. Instead of following the traditional approach, we ask subjects to identify and label depth polarizations, and a depth perception difficulty index (DPDI) is developed based on the percentage of correct and incorrect subject judgements. We find this approach highly effective at quantifying depth perception induced by stereo cues and observe a number of interesting effects regarding image content dependency, distortion-type dependence, and the impact of symmetric versus asymmetric distortions. Furthermore, we propose a novel computational model for DPDI prediction. Our results show that the proposed model, without explicitly identifying image distortion types, leads to highly promising DPDI prediction performance. We believe that these are useful steps toward building a comprehensive understanding on 3D quality-of-experience of stereoscopic images.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA