Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Sensors (Basel) ; 23(16)2023 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-37631763

RESUMO

For compressed images and videos, quality enhancement is essential. Though there have been remarkable achievements related to deep learning, deep learning models are too large to apply to real-time tasks. Therefore, a fast multi-frame quality enhancement method for compressed video, named Fast-MFQE, is proposed to meet the requirement of video-quality enhancement for real-time applications. There are three main modules in this method. One is the image pre-processing building module (IPPB), which is used to reduce redundant information of input images. The second one is the spatio-temporal fusion attention (STFA) module. It is introduced to effectively merge temporal and spatial information of input video frames. The third one is the feature reconstruction network (FRN), which is developed to effectively reconstruct and enhance the spatio-temporal information. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods in terms of lightweight parameters, inference speed, and quality enhancement performance. Even at a resolution of 1080p, the Fast-MFQE achieves a remarkable inference speed of over 25 frames per second, while providing a PSNR increase of 19.6% on average when QP = 37.

2.
IEEE Trans Image Process ; 33: 3793-3808, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38865219

RESUMO

Recent object re-identification (Re-ID) methods gain high efficiency via lightweight student models trained by knowledge distillation (KD). However, the huge architectural difference between lightweight students and heavy teachers causes students to have difficulties in receiving and understanding teachers' knowledge, thus losing certain accuracy. To this end, we propose a refiner-expander-refiner (RER) structure to enlarge a student's representational capacity and prune the student's complexity. The expander is a multi-branch convolutional layer to expand the student's representational capacity to understand a teacher's knowledge comprehensively, which does not require any feature-dimensional adapter to avoid knowledge distortions. The two refiners are 1×1 convolutional layers to prune the input and output channels of the expander. In addition, in order to alleviate the competition accuracy-related and pruning-related gradients, we design a common consensus gradient resetting (CCGR) method, which discards unimportant channels according to the intersection of each sample's unimportant channel judgment. Finally, the trained RER can be simplified into a slim convolutional layer via re-parameterization to speed up inference. As a result, we propose an expanding and refining hybrid compressing (ERHC) method. Extensive experiments show that our ERHC has superior inference speed and accuracy, e.g., on the VeRi-776 dataset, given the ResNet101 as a teacher, ERHC saves 75.33% model parameters (MP) and 74.29% floating-point of operations (FLOPs) without sacrificing accuracy.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38739516

RESUMO

In this paper, we study the problem of efficiently and effectively embedding the high-dimensional spatio-spectral information of hyperspectral (HS) images, guided by feature diversity. Specifically, based on the theoretical formulation that feature diversity is correlated with the rank of the unfolded kernel matrix, we rectify 3D convolution by modifying its topology to enhance the rank upper-bound. This modification yields a rank-enhanced spatial-spectral symmetrical convolution set (ReS 3-ConvSet), which not only learns diverse and powerful feature representations but also saves network parameters. Additionally, we also propose a novel diversity-aware regularization (DA-Reg) term that directly acts on the feature maps to maximize independence among elements. To demonstrate the superiority of the proposed ReS 3-ConvSet and DA-Reg, we apply them to various HS image processing and analysis tasks, including denoising, spatial super-resolution, and classification. Extensive experiments show that the proposed approaches outperform state-of-the-art methods both quantitatively and qualitatively to a significant extent. The code is publicly available at https://github.com/jinnh/ReSSS-ConvSet.

4.
Neural Netw ; 179: 106576, 2024 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-39121790

RESUMO

Visible-infrared person re-identification (VIPR) plays an important role in intelligent transportation systems. Modal discrepancies between visible and infrared images seriously confuse person appearance discrimination, e.g., the similarity of the same class of different modalities is lower than the similarity between different classes of the same modality. Worse still, the modal discrepancies and appearance discrepancies are coupled with each other. The prevailing practice is to disentangle modal and appearance discrepancies, but it usually requires complex decoupling networks. In this paper, rather than disentanglement, we propose to measure and optimize modal discrepancies. We explore a cross-modal group-relation (CMGR) to describe the relationship between the same group of people in two different modalities. The CMGR has great potential in modal invariance because it considers more stable groups rather than individuals, so it is a good measurement for modal discrepancies. Furthermore, we design a group-relation correlation (GRC) loss function based on Pearson correlations to optimize CMGR, which can be easily integrated with the learning of VIPR's appearance features. Consequently, our CMGR model serves as a pivotal constraint to minimize modal discrepancies, operating in a manner similar to a loss function. It is applied solely during the training phase, thereby obviating the need for any execution during the inference phase. Experimental results on two public datasets (i.e., RegDB and SYSU-MM01) demonstrate that our CMGR method is superior to state-of-the-art approaches. In particular, on the RegDB dataset, with the help of CMGR, the rank-1 identification rate has improved by more than 7% compared to the case of not using CMGR.


Assuntos
Raios Infravermelhos , Humanos , Redes Neurais de Computação , Algoritmos , Identificação Biométrica/métodos
5.
Artigo em Inglês | MEDLINE | ID: mdl-37018244

RESUMO

Eliminating the flickers in digital images captured by rolling shutter cameras is a fundamental and important task in computer vision applications. The flickering effect in a single image stems from the mechanism of asynchronous exposure of rolling shutters employed by cameras equipped with CMOS sensors. In an artificial lighting environment, the light intensity captured at different time intervals varies due to the fluctuation of the AC-powered grid, ultimately leading to the flickering artifact in the image. Up to date, there are few studies related to single image deflickering. Further, it is even more challenging to remove flickers without a priori information, e.g., camera parameters or paired images. To address these challenges, we propose an unsupervised framework termed DeflickerCycleGAN, which is trained on unpaired images for end-to-end single image deflickering. Besides the cycle-consistency loss to maintain the similarity of image contents, we meticulously design another two novel loss functions, i.e., gradient loss and flicker loss, to reduce the risk of edge blurring and color distortion. Moreover, we provide a strategy to determine whether an image contains flickers or not without extra training, which leverages an ensemble methodology based on the output of two previously trained markovian discriminators. Extensive experiments on both synthetic and real datasets show that our proposed DeflickerCycleGAN not only achieves excellent performance on flicker removal in a single image but also shows high accuracy and competitive generalization ability on flicker detection, compared to that of a well-trained classifier based on ResNet50.

6.
IEEE Trans Image Process ; 32: 2493-2507, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37099471

RESUMO

Self-supervised video-based action recognition is a challenging task, which needs to extract the principal information characterizing the action from content-diversified videos over large unlabeled datasets. However, most existing methods choose to exploit the natural spatio-temporal properties of video to obtain effective action representations from a visual perspective, while ignoring the exploration of the semantic that is closer to human cognition. For that, a self-supervised Video-based Action Recognition method with Disturbances called VARD, which extracts the principal information of the action in terms of the visual and semantic, is proposed. Specifically, according to cognitive neuroscience research, the recognition ability of humans is activated by visual and semantic attributes. An intuitive impression is that minor changes of the actor or scene in video do not affect one person's recognition of the action. On the other hand, different humans always make consistent opinions when they recognize the same action video. In other words, for an action video, the necessary information that remains constant despite the disturbances in the visual video or the semantic encoding process is sufficient to represent the action. Therefore, to learn such information, we construct a positive clip/embedding for each action video. Compared to the original video clip/embedding, the positive clip/embedding is disturbed visually/semantically by Video Disturbance and Embedding Disturbance. Our objective is to pull the positive closer to the original clip/embedding in the latent space. In this way, the network is driven to focus on the principal information of the action while the impact of sophisticated details and inconsequential variations is weakened. It is worthwhile to mention that the proposed VARD does not require optical flow, negative samples, and pretext tasks. Extensive experiments conducted on the UCF101 and HMDB51 datasets demonstrate that the proposed VARD effectively improves the strong baseline and outperforms multiple classical and advanced self-supervised action recognition methods.


Assuntos
Algoritmos , Reconhecimento Automatizado de Padrão , Humanos , Reconhecimento Automatizado de Padrão/métodos , Semântica
7.
Artigo em Inglês | MEDLINE | ID: mdl-37022078

RESUMO

Transformers are more and more popular in computer vision, which treat an image as a sequence of patches and learn robust global features from the sequence. However, pure transformers are not entirely suitable for vehicle re-identification because vehicle re-identification requires both robust global features and discriminative local features. For that, a graph interactive transformer (GiT) is proposed in this paper. In the macro view, a list of GiT blocks are stacked to build a vehicle re-identification model, in where graphs are to extract discriminative local features within patches and transformers are to extract robust global features among patches. In the micro view, graphs and transformers are in an interactive status, bringing effective cooperation between local and global features. Specifically, one current graph is embedded after the former level's graph and transformer, while the current transform is embedded after the current graph and the former level's transformer. In addition to the interaction between graphs and transforms, the graph is a newly-designed local correction graph, which learns discriminative local features within a patch by exploring nodes' relationships. Extensive experiments on three large-scale vehicle re-identification datasets demonstrate that our GiT method is superior to state-of-the-art vehicle re-identification approaches.

8.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 9486-9503, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37022422

RESUMO

Existing image-based rendering methods usually adopt depth-based image warping operation to synthesize novel views. In this paper, we reason the essential limitations of the traditional warping operation to be the limited neighborhood and only distance-based interpolation weights. To this end, we propose content-aware warping, which adaptively learns the interpolation weights for pixels of a relatively large neighborhood from their contextual information via a lightweight neural network. Based on this learnable warping module, we propose a new end-to-end learning-based framework for novel view synthesis from a set of input source views, in which two additional modules, namely confidence-based blending and feature-assistant spatial refinement, are naturally proposed to handle the occlusion issue and capture the spatial correlation among pixels of the synthesized view, respectively. Besides, we also propose a weight-smoothness loss term to regularize the network. Experimental results on light field datasets with wide baselines and multi-view datasets show that the proposed method significantly outperforms state-of-the-art methods both quantitatively and visually. The source code is publicly available at https://github.com/MantangGuo/CW4VS.


Assuntos
Algoritmos , Aprendizagem , Redes Neurais de Computação , Software
9.
Artigo em Inglês | MEDLINE | ID: mdl-36083961

RESUMO

Consensus clustering can derive a more promising and robust clustering result by integrating multiple partitions strategically. However, there are several limitations in the existing approaches: 1) most of the methods compute the ensemble-information matrix heuristically and lack of sufficient optimization; 2) the information from the original dataset is rarely considered; and 3) the noise in both label space and feature space is ignored. To address these issues, we proposed a novel consensus clustering method with co-association matrix optimization (CC-CMO), which aims at improving the co-association matrix by taking abundant information from both label space and feature space into consideration. In label space, CC-CMO derives a weighted partition matrix capturing the intercluster correlation and further designs a least squares regression (LSR) model to explore the global structure of data. In feature space, CC-CMO minimizes the reconstruction error with doubly stochastic normalization in the projective subspace to eliminate noise features as well as learn the local affinity of data. To improve the co-association matrix by jointly considering the subspace representation, global structure, and local affinity of data, we explicitly propose a unified optimization framework and design an alternating optimization algorithm for the optimal co-association matrix. Extensive experiments on a variety of real-world datasets demonstrate the superior performance of CC-CMO to the state-of-the-art consensus clustering approaches.

10.
IEEE Trans Image Process ; 31: 3765-3779, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35604974

RESUMO

This paper proposes a new full-reference image quality assessment (IQA) model for performing perceptual quality evaluation on light field (LF) images, called the spatial and geometry feature-based model (SGFM). Considering that the LF image describe both spatial and geometry information of the scene, the spatial features are extracted over the sub-aperture images (SAIs) by using contourlet transform and then exploited to reflect the spatial quality degradation of the LF images, while the geometry features are extracted across the adjacent SAIs based on 3D-Gabor filter and then explored to describe the viewing consistency loss of the LF images. These schemes are motivated and designed based on the fact that the human eyes are more interested in the scale, direction, contour from the spatial perspective and viewing angle variations from the geometry perspective. These operations are applied to the reference and distorted LF images independently. The degree of similarity can be computed based on the above-measured quantities for jointly arriving at the final IQA score of the distorted LF image. Experimental results on three commonly-used LF IQA datasets show that the proposed SGFM is more in line with the quality assessment of the LF images perceived by the human visual system (HVS), compared with multiple classical and state-of-the-art IQA models.

11.
IEEE Trans Pattern Anal Mach Intell ; 44(4): 1819-1836, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32966211

RESUMO

A densely-sampled light field (LF) is highly desirable in various applications, such as 3-D reconstruction, post-capture refocusing and virtual reality. However, it is costly to acquire such data. Although many computational methods have been proposed to reconstruct a densely-sampled LF from a sparsely-sampled one, they still suffer from either low reconstruction quality, low computational efficiency, or the restriction on the regularity of the sampling pattern. To this end, we propose a novel learning-based method, which accepts sparsely-sampled LFs with irregular structures, and produces densely-sampled LFs with arbitrary angular resolution accurately and efficiently. We also propose a simple yet effective method for optimizing the sampling pattern. Our proposed method, an end-to-end trainable network, reconstructs a densely-sampled LF in a coarse-to-fine manner. Specifically, the coarse sub-aperture image (SAI) synthesis module first explores the scene geometry from an unstructured sparsely-sampled LF and leverages it to independently synthesize novel SAIs, in which a confidence-based blending strategy is proposed to fuse the information from different input SAIs, giving an intermediate densely-sampled LF. Then, the efficient LF refinement module learns the angular relationship within the intermediate result to recover the LF parallax structure. Comprehensive experimental evaluations demonstrate the superiority of our method on both real-world and synthetic LF images when compared with state-of-the-art methods. In addition, we illustrate the benefits and advantages of the proposed approach when applied in various LF-based applications, including image-based rendering and depth estimation enhancement. The code is available at https://github.com/jingjin25/LFASR-FS-GAF.

12.
IEEE Trans Image Process ; 31: 5720-5732, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36040941

RESUMO

In this paper, we investigate the problem of hyperspectral (HS) image spatial super-resolution via deep learning. Particularly, we focus on how to embed the high-dimensional spatial-spectral information of HS images efficiently and effectively. Specifically, in contrast to existing methods adopting empirically-designed network modules, we formulate HS embedding as an approximation of the posterior distribution of a set of carefully-defined HS embedding events, including layer-wise spatial-spectral feature extraction and network-level feature aggregation. Then, we incorporate the proposed feature embedding scheme into a source-consistent super-resolution framework that is physically-interpretable, producing PDE-Net, in which high-resolution (HR) HS images are iteratively refined from the residuals between input low-resolution (LR) HS images and pseudo-LR-HS images degenerated from reconstructed HR-HS images via probability-inspired HS embedding. Extensive experiments over three common benchmark datasets demonstrate that PDE-Net achieves superior performance over state-of-the-art methods. Besides, the probabilistic characteristic of this kind of networks can provide the epistemic uncertainty of the network outputs, which may bring additional benefits when used for other HS image-based applications. The code will be publicly available at https://github.com/jinnh/PDE-Net.

13.
IEEE Trans Image Process ; 31: 6175-6187, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36126028

RESUMO

In this paper, a full-reference video quality assessment (VQA) model is designed for the perceptual quality assessment of the screen content videos (SCVs), called the hybrid spatiotemporal feature-based model (HSFM). The SCVs are of hybrid structure including screen and natural scenes, which are perceived by the human visual system (HVS) with different visual effects. With this consideration, the three dimensional Laplacian of Gaussian (3D-LOG) filter and three dimensional Natural Scene Statistics (3D-NSS) are exploited to extract the screen and natural spatiotemporal features, based on the reference and distorted SCV sequences separately. The similarities of these extracted features are then computed independently, followed by generating the distorted screen and natural quality scores for screen and natural scenes. After that, an adaptive screen and natural quality fusion scheme through the local video activity is developed to combine them for arriving at the final VQA score of the distorted SCV under evaluation. The experimental results on the Screen Content Video Database (SCVD) and Compressed Screen Content Video Quality (CSCVQ) databases have shown that the proposed HSFM is more in line with the perceptual quality assessment of the SCVs perceived by the HVS, compared with a variety of classic and latest IQA/VQA models.


Assuntos
Algoritmos , Bases de Dados Factuais , Humanos , Gravação em Vídeo/métodos
14.
IEEE Trans Image Process ; 30: 1423-1438, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33332269

RESUMO

This paper explores the problem of hyperspectral image (HSI) super-resolution that merges a low resolution HSI (LR-HSI) and a high resolution multispectral image (HR-MSI). The cross-modality distribution of the spatial and spectral information makes the problem challenging. Inspired by the classic wavelet decomposition-based image fusion, we propose a novel lightweight deep neural network-based framework, namely progressive zero-centric residual network (PZRes-Net), to address this problem efficiently and effectively. Specifically, PZRes-Net learns a high resolution and zero-centric residual image, which contains high-frequency spatial details of the scene across all spectral bands, from both inputs in a progressive fashion along the spectral dimension. And the resulting residual image is then superimposed onto the up-sampled LR-HSI in a mean-value invariant manner, leading to a coarse HR-HSI, which is further refined by exploring the coherence across all spectral bands simultaneously. To learn the residual image efficiently and effectively, we employ spectral-spatial separable convolution with dense connections. In addition, we propose zero-mean normalization implemented on the feature maps of each layer to realize the zero-mean characteristic of the residual image. Extensive experiments over both real and synthetic benchmark datasets demonstrate that our PZRes-Net outperforms state-of-the-art methods to a significant extent in terms of both 4 quantitative metrics and visual quality, e.g., our PZRes-Net improves the PSNR more than 3dB, while saving 2.3× parameters and consuming 15× less FLOPs. The code is publicly available at https://github.com/zbzhzhy/PZRes-Net.

15.
IEEE Trans Neural Netw Learn Syst ; 32(8): 3748-3754, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32822306

RESUMO

Due to the effectiveness of anomaly/outlier detection, one-class algorithms have been extensively studied in the past. The representatives include the shallow-structure methods and deep networks, such as the one-class support vector machine (OC-SVM), one-class extreme learning machine (OC-ELM), deep support vector data description (Deep SVDD), and multilayer OC-ELM (ML-OCELM/MK-OCELM). However, existing algorithms are generally built on the minimum mean-square-error (mse) criterion, which is robust to the Gaussian noises but less effective in dealing with large outliers. To alleviate this deficiency, a robust maximum correntropy criterion (MCC)-based OC-ELM (MC-OCELM) is first proposed and then further extended to a hierarchical network to enhance its capability in characterizing complex and large data (named HC-OCELM). The gradient derivation combining with a fixed-point iterative updation scheme is adopted for the output weight optimization. Experiments on many benchmark data sets are conducted for effectiveness validation. Comparisons to many state-of-the-art approaches are provided for the superiority demonstration.

16.
Artigo em Inglês | MEDLINE | ID: mdl-32149636

RESUMO

In this paper, a progressive collaborative representation (PCR) framework is proposed that is able to incorporate any existing color image demosaicing method for further boosting its demosaicing performance. Our PCR consists of two phases: (i) offline training and (ii) online refinement. In phase (i), multiple training-and-refining stages will be performed. In each stage, a new dictionary will be established through the learning of a large number of feature-patch pairs, extracted from the demosaicked images of the current stage and their corresponding original full-color images. After training, a projection matrix will be generated and exploited to refine the current demosaicked image. The updated image with improved image quality will be used as the input for the next training-and-refining stage and performed the same processing likewise. At the end of phase (i), all the projection matrices generated as above-mentioned will be exploited in phase (ii) to conduct online demosaicked image refinement of the test image. Extensive simulations conducted on two commonly-used test datasets (i.e., the IMAX and Kodak) for evaluating the demosaicing algorithms have clearly demonstrated that our proposed PCR framework is able to constantly boost the performance of any image demosaicing method we experimented, in terms of the objective and subjective performance evaluations.

17.
Artigo em Inglês | MEDLINE | ID: mdl-32845839

RESUMO

In this paper, we make the first attempt to study the subjective and objective quality assessment for the screen content videos (SCVs). For that, we construct the first large-scale video quality assessment (VQA) database specifically for the SCVs, called the screen content video database (SCVD). This SCVD provides 16 reference SCVs, 800 distorted SCVs, and their corresponding subjective scores, and it is made publicly available for research usage. The distorted SCVs are generated from each reference SCV with 10 distortion types and 5 degradation levels for each distortion type. Each distorted SCV is rated by at least 32 subjects in the subjective test. Furthermore, we propose the first full-reference VQA model for the SCVs, called the spatiotemporal Gabor feature tensor-based model (SGFTM), to objectively evaluate the perceptual quality of the distorted SCVs. This is motivated by the observation that 3D-Gabor filter can well stimulate the visual functions of the human visual system (HVS) on perceiving videos, being more sensitive to the edge and motion information that are often-encountered in the SCVs. Specifically, the proposed SGFTM exploits 3D-Gabor filter to individually extract the spatiotemporal Gabor feature tensors from the reference and distorted SCVs, followed by measuring their similarities and later combining them together through the developed spatiotemporal feature tensor pooling strategy to obtain the final SGFTM score. Experimental results on SCVD have shown that the proposed SGFTM yields a high consistency on the subjective perception of SCV quality and consistently outperforms multiple classical and state-of-the-art image/video quality assessment models.

18.
Artigo em Inglês | MEDLINE | ID: mdl-31478850

RESUMO

3D point clouds associated with attributes are considered as a promising paradigm for immersive communication. However, the corresponding compression schemes for this media are still in the infant stage. Moreover, in contrast to conventional image/video compression, it is a more challenging task to compress 3D point cloud data, arising from the irregular structure. In this paper, we propose a novel and effective compression scheme for the attributes of voxelized 3D point clouds. In the first stage, an input voxelized 3D point cloud is divided into blocks of equal size. Then, to deal with the irregular structure of 3D point clouds, a geometry-guided sparse representation (GSR) is proposed to eliminate the redundancy within each block, which is formulated as an ℓ0-norm regularized optimization problem. Also, an inter-block prediction scheme is applied to remove the redundancy between blocks. Finally, by quantitatively analyzing the characteristics of the resulting transform coefficients by GSR, an effective entropy coding strategy that is tailored to our GSR is developed to generate the bitstream. Experimental results over various benchmark datasets show that the proposed compression scheme is able to achieve better rate-distortion performance and visual quality, compared with state-of-the-art methods.

19.
IEEE Trans Image Process ; 27(9): 4516-4528, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29897876

RESUMO

In this paper, an accurate and efficient full-reference image quality assessment (IQA) model using the extracted Gabor features, called Gabor feature-based model (GFM), is proposed for conducting objective evaluation of screen content images (SCIs). It is well-known that the Gabor filters are highly consistent with the response of the human visual system (HVS), and the HVS is highly sensitive to the edge information. Based on these facts, the imaginary part of the Gabor filter that has odd symmetry and yields edge detection is exploited to the luminance of the reference and distorted SCI for extracting their Gabor features, respectively. The local similarities of the extracted Gabor features and two chrominance components, recorded in the LMN color space, are then measured independently. Finally, the Gabor-feature pooling strategy is employed to combine these measurements and generate the final evaluation score. Experimental simulation results obtained from two large SCI databases have shown that the proposed GFM model not only yields a higher consistency with the human perception on the assessment of SCIs but also requires a lower computational complexity, compared with that of classical and state-of-the-art IQA models. The source code for the proposed GFM will be available at http://smartviplab.org/pubilcations/GFM.html.

20.
IEEE Trans Image Process ; 26(10): 4818-4831, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28644808

RESUMO

In this paper, an accurate full-reference image quality assessment (IQA) model developed for assessing screen content images (SCIs), called the edge similarity (ESIM), is proposed. It is inspired by the fact that the human visual system (HVS) is highly sensitive to edges that are often encountered in SCIs; therefore, essential edge features are extracted and exploited for conducting IQA for the SCIs. The key novelty of the proposed ESIM lies in the extraction and use of three salient edge features-i.e., edge contrast, edge width, and edge direction. The first two attributes are simultaneously generated from the input SCI based on a parametric edge model, while the last one is derived directly from the input SCI. The extraction of these three features will be performed for the reference SCI and the distorted SCI, individually. The degree of similarity measured for each above-mentioned edge attribute is then computed independently, followed by combining them together using our proposed edge-width pooling strategy to generate the final ESIM score. To conduct the performance evaluation of our proposed ESIM model, a new and the largest SCI database (denoted as SCID) is established in our work and made to the public for download. Our database contains 1800 distorted SCIs that are generated from 40 reference SCIs. For each SCI, nine distortion types are investigated, and five degradation levels are produced for each distortion type. Extensive simulation results have clearly shown that the proposed ESIM model is more consistent with the perception of the HVS on the evaluation of distorted SCIs than the multiple state-of-the-art IQA methods.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA