Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
Add more filters










Publication year range
1.
IEEE Trans Cybern ; PP2023 Dec 25.
Article in English | MEDLINE | ID: mdl-38145521

ABSTRACT

The quality of videos is the primary concern of video service providers. Built upon deep neural networks, video quality assessment (VQA) has rapidly progressed. Although existing works have introduced the knowledge of the human visual system (HVS) into VQA, there are still some limitations that hinder the full exploitation of HVS, including incomplete modeling with few HVS characteristics and insufficient connection among these characteristics. In this article, we present a novel spatial-temporal VQA method termed HVS-5M, wherein we design five modules to simulate five characteristics of HVS and create a bioinspired connection among these modules in a cooperative manner. Specifically, on the side of the spatial domain, the visual saliency module first extracts a saliency map. Then, the content-dependency and the edge masking modules extract the content and edge features, respectively, which are both weighted by the saliency map to highlight those regions that human beings may be interested in. On the other side of the temporal domain, the motion perception module extracts the dynamic temporal features. Besides, the temporal hysteresis module simulates the memory mechanism of human beings and comprehensively evaluates the video quality according to the fusion features from the spatial and temporal domains. Extensive experiments show that our HVS-5M outperforms the state-of-the-art VQA methods. Ablation studies are further conducted to verify the effectiveness of each module toward the proposed method. The source code is available at https://github.com/GZHU-DVL/HVS-5M.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 9325-9338, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37027639

ABSTRACT

Both network pruning and neural architecture search (NAS) can be interpreted as techniques to automate the design and optimization of artificial neural networks. In this paper, we challenge the conventional wisdom of training before pruning by proposing a joint search-and-training approach to learn a compact network directly from scratch. Using pruning as a search strategy, we advocate three new insights for network engineering: 1) to formulate adaptive search as a cold start strategy to find a compact subnetwork on the coarse scale; and 2) to automatically learn the threshold for network pruning; 3) to offer flexibility to choose between efficiency and robustness. More specifically, we propose an adaptive search algorithm in the cold start by exploiting the randomness and flexibility of filter pruning. The weights associated with the network filters will be updated by ThreshNet, a flexible coarse-to-fine pruning method inspired by reinforcement learning. In addition, we introduce a robust pruning strategy leveraging the technique of knowledge distillation through a teacher-student network. Extensive experiments on ResNet and VGGNet have shown that our proposed method can achieve a better balance in terms of efficiency and accuracy and notable advantages over current state-of-the-art pruning methods in several popular datasets, including CIFAR10, CIFAR100, and ImageNet. The code associate with this paper is available at: https://see.xidian.edu.cn/faculty/wsdong/Projects/AST-NP.htm.


Subject(s)
Algorithms , Learning , Humans , Neural Networks, Computer
3.
Sensors (Basel) ; 23(5)2023 Feb 24.
Article in English | MEDLINE | ID: mdl-36904754

ABSTRACT

Medical images are used as an important basis for diagnosing diseases, among which CT images are seen as an important tool for diagnosing lung lesions. However, manual segmentation of infected areas in CT images is time-consuming and laborious. With its excellent feature extraction capabilities, a deep learning-based method has been widely used for automatic lesion segmentation of COVID-19 CT images. However, the segmentation accuracy of these methods is still limited. To effectively quantify the severity of lung infections, we propose a Sobel operator combined with multi-attention networks for COVID-19 lesion segmentation (SMA-Net). In our SMA-Net method, an edge feature fusion module uses the Sobel operator to add edge detail information to the input image. To guide the network to focus on key regions, SMA-Net introduces a self-attentive channel attention mechanism and a spatial linear attention mechanism. In addition, the Tversky loss function is adopted for the segmentation network for small lesions. Comparative experiments on COVID-19 public datasets show that the average Dice similarity coefficient (DSC) and joint intersection over union (IOU) of the proposed SMA-Net model are 86.1% and 77.8%, respectively, which are better than those in most existing segmentation networks.


Subject(s)
COVID-19 , Labor, Obstetric , Pregnancy , Female , Humans , Image Processing, Computer-Assisted
4.
IEEE Trans Image Process ; 31: 5189-5202, 2022.
Article in English | MEDLINE | ID: mdl-35914042

ABSTRACT

Visual Emotion Analysis (VEA), which aims to predict people's emotions towards different visual stimuli, has become an attractive research topic recently. Rather than a single label classification task, it is more rational to regard VEA as a Label Distribution Learning (LDL) problem by voting from different individuals. Existing methods often predict visual emotion distribution in a unified network, neglecting the inherent subjectivity in its crowd voting process. In psychology, the Object-Appraisal-Emotion model has demonstrated that each individual's emotion is affected by his/her subjective appraisal, which is further formed by the affective memory. Inspired by this, we propose a novel Subjectivity Appraise-and-Match Network (SAMNet) to investigate the subjectivity in visual emotion distribution. To depict the diversity in crowd voting process, we first propose the Subjectivity Appraising with multiple branches, where each branch simulates the emotion evocation process of a specific individual. Specifically, we construct the affective memory with an attention-based mechanism to preserve each individual's unique emotional experience. A subjectivity loss is further proposed to guarantee the divergence between different individuals. Moreover, we propose the Subjectivity Matching with a matching loss, aiming at assigning unordered emotion labels to ordered individual predictions in a one-to-one correspondence with the Hungarian algorithm. Extensive experiments and comparisons are conducted on public visual emotion distribution datasets, and the results demonstrate that the proposed SAMNet consistently outperforms the state-of-the-art methods. Ablation study verifies the effectiveness of our method and visualization proves its interpretability.


Subject(s)
Algorithms , Emotions , Female , Humans , Male
5.
IEEE Trans Image Process ; 31: 4937-4951, 2022.
Article in English | MEDLINE | ID: mdl-35853054

ABSTRACT

Due to the rapid increase in video traffic and relatively limited delivery infrastructure, end users often experience dynamically varying quality over time when viewing streaming videos. The user quality-of-experience (QoE) must be continuously monitored to deliver an optimized service. However, modern approaches for continuous-time video QoE estimation require densely annotating the continuous-time QoE labels, which is labor-intensive and time-consuming. To cope with such limitations, we propose a novel weakly-supervised domain adaptation approach for continuous-time QoE evaluation, by making use of a small amount of continuously labeled data in the source domain and abundant weakly-labeled data (only containing the retrospective QoE labels) in the target domain. Specifically, given a pair of videos from source and target domains, effective spatiotemporal segment-level feature representation is first learned by a combination of 2D and 3D convolutional networks. Then, a multi-task prediction framework is developed to simultaneously achieve continuous-time and retrospective QoE predictions, where a quality attentive adaptation approach is investigated to effectively alleviate the domain discrepancy without hampering the prediction performance. This approach is enabled by explicitly attending to the video-level discrimination and segment-level transferability in terms of the domain discrepancy. Experiments on benchmark databases demonstrate that the proposed method significantly improves the prediction performance under the cross-domain setting.

6.
IEEE Trans Image Process ; 31: 3578-3590, 2022.
Article in English | MEDLINE | ID: mdl-35511851

ABSTRACT

Blind image quality assessment (BIQA), which is capable of precisely and automatically estimating human perceived image quality with no pristine image for comparison, attracts extensive attention and is of wide applications. Recently, many existing BIQA methods commonly represent image quality with a quantitative value, which is inconsistent with human cognition. Generally, human beings are good at perceiving image quality in terms of semantic description rather than quantitative value. Moreover, cognition is a needs-oriented task where humans are able to extract image contents with local to global semantics as they need. The mediocre quality value represents coarse or holistic image quality and fails to reflect degradation on hierarchical semantics. In this paper, to comply with human cognition, a novel quality caption model is inventively proposed to measure fine-grained image quality with hierarchical semantics degradation. Research on human visual system indicates there are hierarchy and reverse hierarchy correlations between hierarchical semantics. Meanwhile, empirical evidence shows that there are also bi-directional degradation dependencies between them. Thus, a novel bi-directional relationship-based network (BDRNet) is proposed for semantics degradation description, through adaptively exploring those correlations and degradation dependencies in a bi-directional manner. Extensive experiments demonstrate that our method outperforms the state-of-the-arts in terms of both evaluation performance and generalization ability.


Subject(s)
Cognition , Semantics , Humans
7.
IEEE Trans Cybern ; 52(3): 1798-1811, 2022 Mar.
Article in English | MEDLINE | ID: mdl-32525805

ABSTRACT

Typical image aesthetics assessment (IAA) is modeled for the generic aesthetics perceived by an "average" user. However, such generic aesthetics models neglect the fact that users' aesthetic preferences vary significantly depending on their unique preferences. Therefore, it is essential to tackle the issue for personalized IAA (PIAA). Since PIAA is a typical small sample learning (SSL) problem, existing PIAA models are usually built by fine-tuning the well-established generic IAA (GIAA) models, which are regarded as prior knowledge. Nevertheless, this kind of prior knowledge based on "average aesthetics" fails to incarnate the aesthetic diversity of different people. In order to learn the shared prior knowledge when different people judge aesthetics, that is, learn how people judge image aesthetics, we propose a PIAA method based on meta-learning with bilevel gradient optimization (BLG-PIAA), which is trained using individual aesthetic data directly and generalizes to unknown users quickly. The proposed approach consists of two phases: 1) meta-training and 2) meta-testing. In meta-training, the aesthetics assessment of each user is regarded as a task, and the training set of each task is divided into two sets: 1) support set and 2) query set. Unlike traditional methods that train a GIAA model based on average aesthetics, we train an aesthetic meta-learner model by bilevel gradient updating from the support set to the query set using many users' PIAA tasks. In meta-testing, the aesthetic meta-learner model is fine-tuned using a small amount of aesthetic data of a target user to obtain the PIAA model. The experimental results show that the proposed method outperforms the state-of-the-art PIAA metrics, and the learned prior model of BLG-PIAA can be quickly adapted to unseen PIAA tasks.


Subject(s)
Artificial Intelligence , Esthetics , Esthetics/psychology , Humans , Photography
8.
IEEE Trans Image Process ; 31: 458-471, 2022.
Article in English | MEDLINE | ID: mdl-34874856

ABSTRACT

Video quality assessment (VQA) task is an ongoing small sample learning problem due to the costly effort required for manual annotation. Since existing VQA datasets are of limited scale, prior research tries to leverage models pre-trained on ImageNet to mitigate this kind of shortage. Nonetheless, these well-trained models targeting on image classification task can be sub-optimal when applied on VQA data from a significantly different domain. In this paper, we make the first attempt to perform self-supervised pre-training for VQA task built upon contrastive learning method, targeting at exploiting the plentiful unlabeled video data to learn feature representation in a simple-yet-effective way. Specifically, we implement this idea by first generating distorted video samples with diverse distortion characteristics and visual contents based on the proposed distortion augmentation strategy. Afterwards, we conduct contrastive learning to capture quality-aware information by maximizing the agreement on feature representations of future frames and their corresponding predictions in the embedding space. In addition, we further introduce distortion prediction task as an additional learning objective to push the model towards discriminating different distortion categories of the input video. Solving these prediction tasks jointly with the contrastive learning not only provides stronger surrogate supervision signals, but also learns the shared knowledge among the prediction tasks. Extensive experiments demonstrate that our approach sets a new state-of-the-art in self-supervised learning for VQA task. Our results also underscore that the learned pre-trained model can significantly benefit the existing learning based VQA models. Source code is available at https://github.com/cpf0079/CSPT.


Subject(s)
Algorithms , Software
9.
Front Neurosci ; 15: 739138, 2021.
Article in English | MEDLINE | ID: mdl-34744610

ABSTRACT

Image quality assessment (IQA) for authentic distortions in the wild is challenging. Though current IQA metrics have achieved decent performance for synthetic distortions, they still cannot be satisfactorily applied to realistic distortions because of the generalization problem. Improving generalization ability is an urgent task to make IQA algorithms serviceable in real-world applications, while relevant research is still rare. Fundamentally, image quality is determined by both distortion degree and intelligibility. However, current IQA metrics mostly focus on the distortion aspect and do not fully investigate the intelligibility, which is crucial for achieving robust quality estimation. Motivated by this, this paper presents a new framework for building highly generalizable image quality model by integrating the intelligibility. We first analyze the relation between intelligibility and image quality. Then we propose a bilateral network to integrate the above two aspects of image quality. During the fusion process, feature selection strategy is further devised to avoid negative transfer. The framework not only catches the conventional distortion features but also integrates intelligibility features properly, based on which a highly generalizable no-reference image quality model is achieved. Extensive experiments are conducted based on five intelligibility tasks, and the results demonstrate that the proposed approach outperforms the state-of-the-art metrics, and the intelligibility task consistently improves metric performance and generalization ability.

10.
IEEE Trans Image Process ; 30: 8686-8701, 2021.
Article in English | MEDLINE | ID: mdl-34665725

ABSTRACT

Visual Emotion Analysis (VEA) aims at finding out how people feel emotionally towards different visual stimuli, which has attracted great attention recently with the prevalence of sharing images on social networks. Since human emotion involves a highly complex and abstract cognitive process, it is difficult to infer visual emotions directly from holistic or regional features in affective images. It has been demonstrated in psychology that visual emotions are evoked by the interactions between objects as well as the interactions between objects and scenes within an image. Inspired by this, we propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images. To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features. Then, we conduct reasoning on the Emotion Graph using Graph Convolutional Network (GCN), yielding emotion-enhanced object features. We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism. Extensive experiments and comparisons are conducted on eight public visual emotion datasets, and the results demonstrate that the proposed SOLVER consistently outperforms the state-of-the-art methods by a large margin. Ablation studies verify the effectiveness of our method and visualizations prove its interpretability, which also bring new insight to explore the mysteries in VEA. Notably, we further discuss SOLVER on three other potential datasets with extended experiments, where we validate the robustness of our method and notice some limitations of it.


Subject(s)
Algorithms , Emotions , Humans , Semantics
11.
IEEE Trans Image Process ; 30: 3650-3663, 2021.
Article in English | MEDLINE | ID: mdl-33705313

ABSTRACT

Blind image quality assessment (BIQA) is a useful but challenging task. It is a promising idea to design BIQA methods by mimicking the working mechanism of human visual system (HVS). The internal generative mechanism (IGM) indicates that the HVS actively infers the primary content (i.e., meaningful information) of an image for better understanding. Inspired by that, this paper presents a novel BIQA metric by mimicking the active inference process of IGM. Firstly, an active inference module based on the generative adversarial network (GAN) is established to predict the primary content, in which the semantic similarity and the structural dissimilarity (i.e., semantic consistency and structural completeness) are both considered during the optimization. Then, the image quality is measured on the basis of its primary content. Generally, the image quality is highly related to three aspects, i.e., the scene information (content-dependency), the distortion type (distortion-dependency), and the content degradation (degradation-dependency). According to the correlation between the distorted image and its primary content, the three aspects are analyzed and calculated respectively with a multi-stream convolutional neural network (CNN) based quality evaluator. As a result, with the help of the primary content obtained from the active inference and the comprehensive quality degradation measurement from the multi-stream CNN, our method achieves competitive performance on five popular IQA databases. Especially in cross-database evaluations, our method achieves significant improvements.


Subject(s)
Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Algorithms , Databases, Factual
12.
IEEE Trans Image Process ; 30: 3279-3292, 2021.
Article in English | MEDLINE | ID: mdl-33625985

ABSTRACT

Quality of experience (QoE) that serves as a direct evaluation of viewing experience from the end users is of vital importance for network optimization, and should be constantly monitored. Unlike existing video-on-demand streaming services, real-time interactivity is critical to the mobile live broadcasting experience for both broadcasters and their audiences. While existing QoE metrics that are validated on limited video contents and synthetic stall patterns have shown effectiveness in their trained QoE benchmarks, a common caveat is that they often encounter challenges in practical live broadcasting scenarios, where one needs to accurately understand the activity in the video with fluctuating QoE and figure out what is going to happen to support the real-time feedback to the broadcaster. In this paper, we propose a temporal relational reasoning guided QoE evaluation approach for mobile live video broadcasting, namely TRR-QoE, which explicitly attends to the temporal relationships between consecutive frames to achieve a more comprehensive understanding of the distortion-aware variation. In our design, video frames are first processed by deep neural network (DNN) to extract quality-indicative features. Afterwards, besides explicitly integrating features of individual frames to account for the spatial distortion information, multi-scale temporal relational information corresponding to diverse temporal resolutions are made full use of to capture temporal-distortion-aware variation. As a result, the overall QoE prediction could be derived by combining both aspects. The results of experiments conducted on a number of benchmark databases demonstrate the superiority of TRR-QoE over the representative state-of-the-art metrics.

13.
Article in English | MEDLINE | ID: mdl-31995495

ABSTRACT

Traditional image aesthetics assessment (IAA) approaches mainly predict the average aesthetic score of an image. However, people tend to have different tastes on image aesthetics, which is mainly determined by their subjective preferences. As an important subjective trait, personality is believed to be a key factor in modeling individual's subjective preference. In this paper, we present a personality-assisted multi-task deep learning framework for both generic and personalized image aesthetics assessment. The proposed framework comprises two stages. In the first stage, a multi-task learning network with shared weights is proposed to predict the aesthetics distribution of an image and Big-Five (BF) personality traits of people who like the image. The generic aesthetics score of the image can be generated based on the predicted aesthetics distribution. In order to capture the common representation of generic image aesthetics and people's personality traits, a Siamese network is trained using aesthetics data and personality data jointly. In the second stage, based on the predicted personality traits and generic aesthetics of an image, an inter-task fusion is introduced to generate individual's personalized aesthetic scores on the image. The performance of the proposed method is evaluated using two public image aesthetics databases. The experimental results demonstrate that the proposed method outperforms the state-of-the-arts in both generic and personalized IAA tasks.

14.
Article in English | MEDLINE | ID: mdl-31613757

ABSTRACT

Free viewpoint video (FVV) has received considerable attention owing to its widespread applications in several areas such as immersive entertainment, remote surveillance and distanced education. Since FVV images are synthesized via a depth image-based rendering (DIBR) procedure in the "blind" environment (without reference images), a real-time and reliable blind quality assessment metric is urgently required. However, the existing image quality assessment metrics are insensitive to the geometric distortions engendered by DIBR. In this research, a novel blind method of DIBR-synthesized images is proposed based on measuring geometric distortion, global sharpness and image complexity. First, a DIBR-synthesized image is decomposed into wavelet subbands by using discrete wavelet transform. Then, the Canny operator is employed to detect the edges of the binarized low-frequency subband and high-frequency subbands. The edge similarities between the binarized low-frequency subband and high-frequency subbands are further computed to quantify geometric distortions in DIBR-synthesized images. Second, the log-energies of wavelet subbands are calculated to evaluate global sharpness in DIBR-synthesized images. Third, a hybrid filter combining the autoregressive and bilateral filters is adopted to compute image complexity. Finally, the overall quality score is derived to normalize geometric distortion and global sharpness by the image complexity. Experiments show that our proposed quality method is superior to the competing reference-free state-of-the-art DIBR-synthesized image quality models.

15.
Article in English | MEDLINE | ID: mdl-31034415

ABSTRACT

View synthesis is a key technique in free-viewpoint video, which renders virtual views based on texture and depth images. The distortions in synthesized views come from two stages, i.e., the stage of the acquisition and processing of texture and depth images, and the rendering stage using depth-image-based-rendering (DIBR) algorithms. The existing view synthesis quality metrics are designed for the distortions caused by a single stage, which cannot accurately evaluate the quality of the entire view synthesis process. With the considerations that the distortions introduced by two stages both cause edge degradation and texture unnaturalness, and the Difference-of-Gaussian (DoG) representation is powerful in capturing image edge and texture characteristics by simulating the center-surrounding receptive fields of retinal ganglion cells of human eyes, this paper presents a no-reference quality index for Synthesized views using DoG-based Edge statistics and Texture naturalness (SET). To mimic the multi-scale property of the Human Visual System (HVS), DoG images are first calculated at multiple scales. Then the orientation selective statistics features and the texture naturalness features are calculated on the DoG images and the coarsest scale image, producing two groups of quality-aware features. Finally, the quality model is learnt from these features using the random forest regression model. Experimental results on two view synthesis image databases demonstrate that the proposed metric is advantageous over the relevant state-of-the-arts in dealing with the distortions in the whole view synthesis process.

16.
BMC Med Imaging ; 18(1): 17, 2018 05 16.
Article in English | MEDLINE | ID: mdl-29769079

ABSTRACT

BACKGROUND: Quality assessment of medical images is highly related to the quality assurance, image interpretation and decision making. As to magnetic resonance (MR) images, signal-to-noise ratio (SNR) is routinely used as a quality indicator, while little knowledge is known of its consistency regarding different observers. METHODS: In total, 192, 88, 76 and 55 brain images are acquired using T2*, T1, T2 and contrast-enhanced T1 (T1C) weighted MR imaging sequences, respectively. To each imaging protocol, the consistency of SNR measurement is verified between and within two observers, and white matter (WM) and cerebral spinal fluid (CSF) are alternately used as the tissue region of interest (TOI) for SNR measurement. The procedure is repeated on another day within 30 days. At first, overlapped voxels in TOIs are quantified with Dice index. Then, test-retest reliability is assessed in terms of intra-class correlation coefficient (ICC). After that, four models (BIQI, BLIINDS-II, BRISQUE and NIQE) primarily used for the quality assessment of natural images are borrowed to predict the quality of MR images. And in the end, the correlation between SNR values and predicted results is analyzed. RESULTS: To the same TOI in each MR imaging sequence, less than 6% voxels are overlapped between manual delineations. In the quality estimation of MR images, statistical analysis indicates no significant difference between observers (Wilcoxon rank sum test, p w ≥ 0.11; paired-sample t test, p p ≥ 0.26), and good to very good intra- and inter-observer reliability are found (ICC, p icc ≥ 0.74). Furthermore, Pearson correlation coefficient (r p ) suggests that SNRwm correlates strongly with BIQI, BLIINDS-II and BRISQUE in T2* (r p ≥ 0.78), BRISQUE and NIQE in T1 (r p ≥ 0.77), BLIINDS-II in T2 (r p ≥ 0.68) and BRISQUE and NIQE in T1C (r p ≥ 0.62) weighted MR images, while SNRcsf correlates strongly with BLIINDS-II in T2* (r p ≥ 0.63) and in T2 (r p ≥ 0.64) weighted MR images. CONCLUSIONS: The consistency of SNR measurement is validated regarding various observers and MR imaging protocols. When SNR measurement performs as the quality indicator of MR images, BRISQUE and BLIINDS-II can be conditionally used for the automated quality estimation of human brain MR images.


Subject(s)
Brain/diagnostic imaging , Radiographic Image Enhancement/methods , Contrast Media , Humans , Magnetic Resonance Imaging/methods , Observer Variation , Reproducibility of Results , Signal-To-Noise Ratio
17.
IEEE Trans Image Process ; 27(4): 1600-1610, 2018 Apr.
Article in English | MEDLINE | ID: mdl-29324414

ABSTRACT

In this paper, we propose a novel no reference quality assessment method by incorporating statistical luminance and texture features (NRLT) for screen content images (SCIs) with both local and global feature representation. The proposed method is designed inspired by the perceptual property of the human visual system (HVS) that the HVS is sensitive to luminance change and texture information for image perception. In the proposed method, we first calculate the luminance map through the local normalization, which is further used to extract the statistical luminance features in global scope. Second, inspired by existing studies from neuroscience that high-order derivatives can capture image texture, we adopt four filters with different directions to compute gradient maps from the luminance map. These gradient maps are then used to extract the second-order derivatives by local binary pattern. We further extract the texture feature by the histogram of high-order derivatives in global scope. Finally, support vector regression is applied to train the mapping function from quality-aware features to subjective ratings. Experimental results on the public large-scale SCI database show that the proposed NRLT can achieve better performance in predicting the visual quality of SCIs than relevant existing methods, even including some full reference visual quality assessment methods.

18.
PLoS One ; 12(5): e0176632, 2017.
Article in English | MEDLINE | ID: mdl-28459832

ABSTRACT

Blind image quality assessment can be modeled as feature extraction followed by score prediction. It necessitates considerable expertise and efforts to handcraft features for optimal representation of perceptual image quality. This paper addresses blind image sharpness assessment by using a shallow convolutional neural network (CNN). The network takes single feature layer to unearth intrinsic features for image sharpness representation and utilizes multilayer perceptron (MLP) to rate image quality. Different from traditional methods, CNN integrates feature extraction and score prediction into an optimization procedure and retrieves features automatically from raw images. Moreover, its prediction performance can be enhanced by replacing MLP with general regression neural network (GRNN) and support vector regression (SVR). Experiments on Gaussian blur images from LIVE-II, CSIQ, TID2008 and TID2013 demonstrate that CNN features with SVR achieves the best overall performance, indicating high correlation with human subjective judgment.


Subject(s)
Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Algorithms , Color , Humans , Regression Analysis , Software , Support Vector Machine , Time Factors , Visual Perception
19.
IEEE Trans Image Process ; 26(6): 2682-2693, 2017 Jun.
Article in English | MEDLINE | ID: mdl-28333632

ABSTRACT

The just noticeable difference (JND) in an image, which reveals the visibility limitation of the human visual system (HVS), is widely used for visual redundancy estimation in signal processing. To determine the JND threshold with the current schemes, the spatial masking effect is estimated as the contrast masking, and this cannot accurately account for the complicated interaction among visual contents. Research on cognitive science indicates that the HVS is highly adapted to extract the repeated patterns for visual content representation. Inspired by this, we formulate the pattern complexity as another factor to determine the total masking effect: the interaction is relatively straightforward with a limited masking effect in a regular pattern, and is complicated with a strong masking effect in an irregular pattern. From the orientation selectivity mechanism in the primary visual cortex, the response of each local receptive field can be considered as a pattern; therefore, in this paper, the orientation that each pixel presents is regarded as the fundamental element of a pattern, and the pattern complexity is calculated as the diversity of the orientation in a local region. Finally, considering both pattern complexity and luminance contrast, a novel spatial masking estimation function is deduced, and an improved JND estimation model is built. Experimental results on comparing with the latest JND models demonstrate the effectiveness of the proposed model, which performs highly consistent with the human perception. The source code of the proposed model is publicly available at http://web.xidian.edu.cn/wjj/en/index.html.


Subject(s)
Algorithms , Image Processing, Computer-Assisted/methods , Models, Neurological , Signal Processing, Computer-Assisted , Animals , Differential Threshold/physiology , Humans , Pattern Recognition, Visual/physiology
20.
IEEE Trans Image Process ; 25(8): 3775-86, 2016 08.
Article in English | MEDLINE | ID: mdl-27295675

ABSTRACT

Distortions cause structural changes in digital images, leading to degraded visual quality. Dictionary-based sparse representation has been widely studied recently due to its ability to extract inherent image structures. Meantime, it can extract image features with slightly higher level semantics. Intuitively, sparse representation can be used for image quality assessment, because visible distortions can cause significant changes to the sparse features. In this paper, a new sparse representation-based image quality assessment model is proposed based on the construction of adaptive sub-dictionaries. An overcomplete dictionary trained from natural images is employed to capture the structure changes between the reference and distorted images by sparse feature extraction via adaptive sub-dictionary selection. Based on the observation that image sparse features are invariant to weak degradations and the perceived image quality is generally influenced by diverse issues, three auxiliary quality features are added, including gradient, color, and luminance information. The proposed method is not sensitive to training images, so a universal dictionary can be adopted for quality evaluation. Extensive experiments on five public image quality databases demonstrate that the proposed method produces the state-of-the-art results, and it delivers consistently well performances when tested in different image quality databases.

SELECTION OF CITATIONS
SEARCH DETAIL
...