Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 87
Filtrar
1.
IEEE Trans Image Process ; 33: 3227-3241, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38691435

RESUMO

The statistical regularities of natural images, referred to as natural scene statistics, play an important role in no-reference image quality assessment. However, it has been widely acknowledged that screen content images (SCIs), which are typically computer generated, do not hold such statistics. Here we make the first attempt to learn the statistics of SCIs, based upon which the quality of SCIs can be effectively determined. The underlying mechanism of the proposed approach is based upon the mild assumption that the SCIs, which are not physically acquired, still obey certain statistics that could be understood in a learning fashion. We empirically show that the statistics deviation could be effectively leveraged in quality assessment, and the proposed method is superior when evaluated in different settings. Extensive experimental results demonstrate the Deep Feature Statistics based SCI Quality Assessment (DFSS-IQA) model delivers promising performance compared with existing NR-IQA models and shows a high generalization capability in the cross-dataset settings. The implementation of our method is publicly available at https://github.com/Baoliang93/DFSS-IQA.

2.
IEEE Trans Image Process ; 33: 3075-3089, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38656839

RESUMO

In this paper, we propose a graph-represented image distribution similarity (GRIDS) index for full-reference (FR) image quality assessment (IQA), which can measure the perceptual distance between distorted and reference images by assessing the disparities between their distribution patterns under a graph-based representation. First, we transform the input image into a graph-based representation, which is proven to be a versatile and effective choice for capturing visual perception features. This is achieved through the automatic generation of a vision graph from the given image content, leading to holistic perceptual associations for irregular image regions. Second, to reflect the perceived image distribution, we decompose the undirected graph into cliques and then calculate the product of the potential functions for the cliques to obtain the joint probability distribution of the undirected graph. Finally, we compare the distances between the graph feature distributions of the distorted and reference images at different stages; thus, we combine the distortion distribution measurements derived from different graph model depths to determine the perceived quality of the distorted images. The empirical results obtained from an extensive array of experiments underscore the competitive nature of our proposed method, which achieves performance on par with that of the state-of-the-art methods, demonstrating its exceptional predictive accuracy and ability to maintain consistent and monotonic behaviour in image quality prediction tasks. The source code is publicly available at the following website https://github.com/Land5cape/GRIDS.

3.
IEEE Trans Cybern ; PP2024 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-38498757

RESUMO

The development of data sensing technology has generated a vast amount of high-dimensional data, posing great challenges for machine learning models. Over the past decades, despite demonstrating its effectiveness in data classification, genetic programming (GP) has still encountered three major challenges when dealing with high-dimensional data: 1) solution diversity; 2) multiclass imbalance; and 3) large feature space. In this article, we have developed a problem-specific multiobjective GP framework (PS-MOGP) for handling classification tasks with high-dimensional data. To reduce the large solution space caused by high dimensionality, we incorporate the recursive feature elimination strategy based on mining the archive of evolved GP solutions. A progressive domination Pareto archive evolution strategy (PD-PAES), which optimizes the objectives in a specific order according to their objectives, is proposed to evaluate the GP individuals and maintain a better diversity of solutions. Besides, to address the seriously imbalanced class issue caused by traditional binary decomposition (BD) one versus rest (OVR) for multiclass classification problems, we design a method named BD with a similar positive and negative class size (BD-SPNCS) to generate a set of auxiliary classifiers. Experimental results on benchmark and real-world datasets demonstrate that our proposed PS-MOGP outperforms state-of-the-art traditional and evolutionary classification methods in the context of high-dimensional data classification.

4.
IEEE Trans Image Process ; 33: 2502-2513, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38526904

RESUMO

Residual coding has gained prevalence in lossless compression, where a lossy layer is initially employed and the reconstruction errors (i.e., residues) are then losslessly compressed. The underlying principle of the residual coding revolves around the exploration of priors based on context modeling. Herein, we propose a residual coding framework for 3D medical images, involving the off-the-shelf video codec as the lossy layer and a Bilateral Context Modeling based Network (BCM-Net) as the residual layer. The BCM-Net is proposed to achieve efficient lossless compression of residues through exploring intra-slice and inter-slice bilateral contexts. In particular, a symmetry-based intra-slice context extraction (SICE) module is proposed to mine bilateral intra-slice correlations rooted in the inherent anatomical symmetry of 3D medical images. Moreover, a bi-directional inter-slice context extraction (BICE) module is designed to explore bilateral inter-slice correlations from bi-directional references, thereby yielding representative inter-slice context. Experiments on popular 3D medical image datasets demonstrate that the proposed method can outperform existing state-of-the-art methods owing to efficient redundancy reduction. Our code will be available on GitHub for future research.


Assuntos
Compressão de Dados , Compressão de Dados/métodos , Imageamento Tridimensional/métodos
5.
IEEE Trans Cybern ; PP2023 Dec 25.
Artigo em Inglês | MEDLINE | ID: mdl-38145521

RESUMO

The quality of videos is the primary concern of video service providers. Built upon deep neural networks, video quality assessment (VQA) has rapidly progressed. Although existing works have introduced the knowledge of the human visual system (HVS) into VQA, there are still some limitations that hinder the full exploitation of HVS, including incomplete modeling with few HVS characteristics and insufficient connection among these characteristics. In this article, we present a novel spatial-temporal VQA method termed HVS-5M, wherein we design five modules to simulate five characteristics of HVS and create a bioinspired connection among these modules in a cooperative manner. Specifically, on the side of the spatial domain, the visual saliency module first extracts a saliency map. Then, the content-dependency and the edge masking modules extract the content and edge features, respectively, which are both weighted by the saliency map to highlight those regions that human beings may be interested in. On the other side of the temporal domain, the motion perception module extracts the dynamic temporal features. Besides, the temporal hysteresis module simulates the memory mechanism of human beings and comprehensively evaluates the video quality according to the fusion features from the spatial and temporal domains. Extensive experiments show that our HVS-5M outperforms the state-of-the-art VQA methods. Ablation studies are further conducted to verify the effectiveness of each module toward the proposed method. The source code is available at https://github.com/GZHU-DVL/HVS-5M.

6.
IEEE Trans Cybern ; PP2023 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-37943655

RESUMO

Salient instance segmentation (SIS) is an emerging field that evolves from salient object detection (SOD), aiming at identifying individual salient instances using segmentation maps. Inspired by the success of dynamic convolutions in segmentation tasks, this article introduces a keypoints-based SIS network (KepSalinst). It employs multiple keypoints, that is, the center and several peripheral points of an instance, as effective geometrical guidance for dynamic convolutions. The features at peripheral points can help roughly delineate the spatial extent of the instance and complement the information inside the central features. To fully exploit the complementary components within these features, we design a differentiated patterns fusion (DPF) module. This ensures that the resulting dynamic convolutional filters formed by these features are sufficiently comprehensive for precise segmentation. Furthermore, we introduce a high-level semantic guided saliency (HSGS) module. This module enhances the perception of saliency by predicting a map for the input image to estimate a saliency score for each segmented instance. On four SIS datasets (ILSO, SOC, SIS10K, and COME15K), our KepSalinst outperforms all previous models qualitatively and quantitatively.

7.
IEEE Trans Image Process ; 32: 4472-4485, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37335801

RESUMO

Due to the light absorption and scattering induced by the water medium, underwater images usually suffer from some degradation problems, such as low contrast, color distortion, and blurring details, which aggravate the difficulty of downstream underwater understanding tasks. Therefore, how to obtain clear and visually pleasant images has become a common concern of people, and the task of underwater image enhancement (UIE) has also emerged as the times require. Among existing UIE methods, Generative Adversarial Networks (GANs) based methods perform well in visual aesthetics, while the physical model-based methods have better scene adaptability. Inheriting the advantages of the above two types of models, we propose a physical model-guided GAN model for UIE in this paper, referred to as PUGAN. The entire network is under the GAN architecture. On the one hand, we design a Parameters Estimation subnetwork (Par-subnet) to learn the parameters for physical model inversion, and use the generated color enhancement image as auxiliary information for the Two-Stream Interaction Enhancement sub-network (TSIE-subnet). Meanwhile, we design a Degradation Quantization (DQ) module in TSIE-subnet to quantize scene degradation, thereby achieving reinforcing enhancement of key regions. On the other hand, we design the Dual-Discriminators for the style-content adversarial constraint, promoting the authenticity and visual aesthetics of the results. Extensive experiments on three benchmark datasets demonstrate that our PUGAN outperforms state-of-the-art methods in both qualitative and quantitative metrics. The code and results can be found from the link of https://rmcong.github.io/proj_PUGAN.html.

8.
IEEE Trans Image Process ; 32: 2827-2842, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37186533

RESUMO

Convolutional Neural Networks (CNNs) dominate image processing but suffer from local inductive bias, which is addressed by the transformer framework with its inherent ability to capture global context through self-attention mechanisms. However, how to inherit and integrate their advantages to improve compressed sensing is still an open issue. This paper proposes CSformer, a hybrid framework to explore the representation capacity of local and global features. The proposed approach is well-designed for end-to-end compressive image sensing, composed of adaptive sampling and recovery. In the sampling module, images are measured block-by-block by the learned sampling matrix. In the reconstruction stage, the measurements are projected into an initialization stem, a CNN stem, and a transformer stem. The initialization stem mimics the traditional reconstruction of compressive sensing but generates the initial reconstruction in a learnable and efficient manner. The CNN stem and transformer stem are concurrent, simultaneously calculating fine-grained and long-range features and efficiently aggregating them. Furthermore, we explore a progressive strategy and window-based transformer block to reduce the parameters and computational complexity. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing, which achieves superior performance compared to state-of-the-art methods on different datasets. Our codes is available at: https://github.com/Lineves7/CSformer.

9.
Artigo em Inglês | MEDLINE | ID: mdl-37018573

RESUMO

Salient object detection (SOD) aims to determine the most visually attractive objects in an image. With the development of virtual reality (VR) technology, 360 ° omnidirectional image has been widely used, but the SOD task in 360 ° omnidirectional image is seldom studied due to its severe distortions and complex scenes. In this article, we propose a multi-projection fusion and refinement network (MPFR-Net) to detect the salient objects in 360 ° omnidirectional image. Different from the existing methods, the equirectangular projection (EP) image and four corresponding cube-unfolding (CU) images are embedded into the network simultaneously as inputs, where the CU images not only provide supplementary information for EP image but also ensure the object integrity of cube-map projection. In order to make full use of these two projection modes, a dynamic weighting fusion (DWF) module is designed to adaptively integrate the features of different projections in a complementary and dynamic manner from the perspective of inter and intrafeatures. Furthermore, in order to fully explore the way of interaction between encoder and decoder features, a filtration and refinement (FR) module is designed to suppress the redundant information of the feature itself and between the features. Experimental results on two omnidirectional datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both qualitatively and quantitatively. The code and results can be found from the link of https://rmcong.github.io/proj_MPFRNet.html.

10.
IEEE Trans Cybern ; 53(11): 7162-7173, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36264736

RESUMO

So far, researchers have proposed many forensics tools to protect the authenticity and integrity of digital information. However, with the explosive development of machine learning, existing forensics tools may compromise against new attacks anytime. Hence, it is always necessary to investigate anti-forensics to expose the vulnerabilities of forensics tools. It is beneficial for forensics researchers to develop new tools as countermeasures. To date, one of the potential threats is the generative adversarial networks (GANs), which could be employed for fabricating or forging falsified data to attack forensics detectors. In this article, we investigate the anti-forensics performance of GANs by proposing a novel model, the ExS-GAN, which features an extra supervision system. After training, the proposed model could launch anti-forensics attacks on various manipulated images. Evaluated by experiments, the proposed method could achieve high anti-forensics performance while preserving satisfying image quality. We also justify the proposed extra supervision via an ablation study.

11.
Int J Mach Learn Cybern ; 14(5): 1725-1738, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36474954

RESUMO

COVID-19 has resulted in a significant impact on individual lives, bringing a unique challenge for face retrieval under occlusion. In this paper, an occluded face retrieval method which consists of generator, discriminator, and deep hashing retrieval network is proposed for face retrieval in a large-scale face image dataset under variety of occlusion situations. In the proposed method, occluded face images are firstly reconstructed using a face inpainting model, in which the adversarial loss, reconstruction loss and hash bits loss are combined for training. With the trained model, hash codes of real face images and corresponding reconstructed face images are aimed to be as similar as possible. Then, a deep hashing retrieval network is used to generate compact similarity-preserving hashing codes using reconstructed face images for a better retrieval performance. Experimental results show that the proposed method can successfully generate the reconstructed face images under occlusion. Meanwhile, the proposed deep hashing retrieval network achieves better retrieval performance for occluded face retrieval than existing state-of-the-art deep hashing retrieval methods.

12.
IEEE Trans Cybern ; 53(3): 1920-1931, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35867373

RESUMO

The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images. Therefore, how to effectively extract interimage correspondence is crucial for the CoSOD task. In this article, we propose a global-and-local collaborative learning (GLNet) architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM) to capture the comprehensive interimage corresponding relationship among different images from the global and local perspectives. First, we treat different images as different time slices and use 3-D convolution to integrate all intrafeatures intuitively, which can more fully extract the global group semantics. Second, we design a pairwise correlation transformation (PCT) to explore similarity correspondence between pairwise images and combine the multiple local pairwise correspondences to generate the local interimage relationship. Third, the interimage relationships of the GCM and LCM are integrated through a global-and-local correspondence aggregation (GLA) module to explore more comprehensive interimage collaboration cues. Finally, the intra and inter features are adaptively integrated by an intra-and-inter weighting fusion (AEWF) module to learn co-saliency features and predict the co-saliency map. The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms 11 state-of-the-art competitors trained on some large datasets (about 8k-200k images).

13.
IEEE Trans Cybern ; 53(3): 1460-1474, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34516383

RESUMO

The job-shop scheduling problem (JSSP) is a challenging scheduling and optimization problem in the industry and engineering, which relates to the work efficiency and operational costs of factories. The completion time of all jobs is the most commonly considered optimization objective in the existing work. However, factories focus on both time and cost objectives, including completion time, total tardiness, advance time, production cost, and machine loss. Therefore, this article first time proposes a many-objective JSSP that considers all these five objectives to make the model more practical to reflect the various demands of factories. To optimize these five objectives simultaneously, a novel multiple populations for multiple objectives (MPMO) framework-based genetic algorithm (GA) approach, called MPMOGA, is proposed. First, MPMOGA employs five populations to optimize the five objectives, respectively. Second, to avoid each population only focusing on its corresponding single objective, an archive sharing technique (AST) is proposed to store the elite solutions collected from the five populations so that the populations can obtain optimization information about the other objectives from the archive. This way, MPMOGA can approximate different parts of the entire Pareto front (PF). Third, an archive update strategy (AUS) is proposed to further improve the quality of the solutions in the archive. The test instances in the widely used test sets are adopted to evaluate the performance of MPMOGA. The experimental results show that MPMOGA outperforms the compared state-of-the-art algorithms on most of the test instances.

14.
IEEE Trans Neural Netw Learn Syst ; 34(5): 2338-2352, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-34543206

RESUMO

The performance of a convolutional neural network (CNN) heavily depends on its hyperparameters. However, finding a suitable hyperparameters configuration is difficult, challenging, and computationally expensive due to three issues, which are 1) the mixed-variable problem of different types of hyperparameters; 2) the large-scale search space of finding optimal hyperparameters; and 3) the expensive computational cost for evaluating candidate hyperparameters configuration. Therefore, this article focuses on these three issues and proposes a novel estimation of distribution algorithm (EDA) for efficient hyperparameters optimization, with three major contributions in the algorithm design. First, a hybrid-model EDA is proposed to efficiently deal with the mixed-variable difficulty. The proposed algorithm uses a mixed-variable encoding scheme to encode the mixed-variable hyperparameters and adopts an adaptive hybrid-model learning (AHL) strategy to efficiently optimize the mixed-variables. Second, an orthogonal initialization (OI) strategy is proposed to efficiently deal with the challenge of large-scale search space. Third, a surrogate-assisted multi-level evaluation (SME) method is proposed to reduce the expensive computational cost. Based on the above, the proposed algorithm is named s urrogate-assisted hybrid-model EDA (SHEDA). For experimental studies, the proposed SHEDA is verified on widely used classification benchmark problems, and is compared with various state-of-the-art methods. Moreover, a case study on aortic dissection (AD) diagnosis is carried out to evaluate its performance. Experimental results show that the proposed SHEDA is very effective and efficient for hyperparameters optimization, which can find a satisfactory hyperparameters configuration for the CIFAR10, CIFAR100, and AD diagnosis with only 0.58, 0.97, and 1.18 GPU days, respectively.

15.
Sensors (Basel) ; 22(16)2022 Aug 18.
Artigo em Inglês | MEDLINE | ID: mdl-36015952

RESUMO

Deep learning techniques have shown their capabilities to discover knowledge from massive unstructured data, providing data-driven solutions for representation and decision making [...].


Assuntos
Aprendizado Profundo , Diagnóstico por Imagem
16.
Artigo em Inglês | MEDLINE | ID: mdl-35802547

RESUMO

Traditional neural network compression (NNC) methods decrease the model size and floating-point operations (FLOPs) in the manner of screening out unimportant weight parameters; however, the intrinsic sparsity characteristics have not been fully exploited. In this article, from the perspective of signal processing and analysis for network parameters, we propose to use a compressive sensing (CS)-based method, namely NNCS, for performance improvements. Our proposed NNCS is inspired by the discovery that sparsity levels of weight parameters in the transform domain are greater than those in the original domain. First, to achieve sparse representations for parameters in the transform domain during training, we incorporate a constrained CS model into loss function. Second, the proposed effective training process consists of two steps, where the first step trains raw weight parameters and induces and reconstructs their sparse representations and the second step trains transform coefficients to improve network performances. Finally, we transform the entire neural network into another new domain-based representation, and a sparser parameter distribution can be obtained to facilitate inference acceleration. Experimental results demonstrate that NNCS can significantly outperform the other existing state-of-the-art methods in terms of parameter reductions and FLOPs. With VGGNet on CIFAR-10, we decrease 94.8% parameters and achieve a 76.8% reduction of FLOPs, with 0.13% drop in Top-1 accuracy. With ResNet-50 on ImageNet, we decrease 75.6% parameters and achieve a 78.9% reduction of FLOPs, with 1.24% drop in Top-1 accuracy.

17.
Neural Netw ; 153: 142-151, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35728336

RESUMO

This paper presents a collaborative neurodynamic approach to Boolean matrix factorization. Based on a binary optimization formulation to minimize the Hamming distance between a given data matrix and its low-rank reconstruction, the proposed approach employs a population of Boltzmann machines operating concurrently for scatter search of factorization solutions. In addition, a particle swarm optimization rule is used to re-initialize the neuronal states of Boltzmann machines upon their local convergence to escape from local minima toward global solutions. Experimental results demonstrate the superior convergence and performance of the proposed approach against six baseline methods on ten benchmark datasets.


Assuntos
Algoritmos , Benchmarking , Simulação por Computador
18.
Artigo em Inglês | MEDLINE | ID: mdl-35657839

RESUMO

Underwater images typically suffer from color deviations and low visibility due to the wavelength-dependent light absorption and scattering. To deal with these degradation issues, we propose an efficient and robust underwater image enhancement method, called MLLE. Specifically, we first locally adjust the color and details of an input image according to a minimum color loss principle and a maximum attenuation map-guided fusion strategy. Afterward, we employ the integral and squared integral maps to compute the mean and variance of local image blocks, which are used to adaptively adjust the contrast of the input image. Meanwhile, a color balance strategy is introduced to balance the color differences between channel a and channel b in the CIELAB color space. Our enhanced results are characterized by vivid color, improved contrast, and enhanced details. Extensive experiments on three underwater image enhancement datasets demonstrate that our method outperforms the state-of-the-art methods. Our method is also appealing in its fast processing speed within 1s for processing an image of size 1024×1024×3 on a single CPU. Experiments further suggest that our method can effectively improve the performance of underwater image segmentation, keypoint detection, and saliency detection. The project page is available at https://li-chongyi.github.io/proj_MMLE.html.

19.
Front Neurosci ; 16: 869522, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35573313

RESUMO

The mental workload (MWL) of different occupational groups' workers is the main and direct factor of unsafe behavior, which may cause serious accidents. One of the new and useful technologies to estimate MWL is the Brain computer interface (BCI) based on EEG signals, which is regarded as the gold standard of cognitive status. However, estimation systems involving handcrafted EEG features are time-consuming and unsuitable to apply in real-time. The purpose of this study was to propose an end-to-end BCI framework for MWL estimation. First, a new automated data preprocessing method was proposed to remove the artifact without human interference. Then a new neural network structure named EEG-TNet was designed to extract both the temporal and frequency information from the original EEG. Furthermore, two types of experiments and ablation studies were performed to prove the effectiveness of this model. In the subject-dependent experiment, the estimation accuracy of dual-task estimation (No task vs. TASK) and triple-task estimation (Lo vs. Mi vs. Hi) reached 99.82 and 99.21%, respectively. In contrast, the accuracy of different tasks reached 82.78 and 66.83% in subject-independent experiments. Additionally, the ablation studies proved that preprocessing method and network structure had significant contributions to estimation MWL. The proposed method is convenient without any human intervention and outperforms other related studies, which becomes an effective way to reduce human factor risks.

20.
IEEE Trans Image Process ; 31: 1613-1627, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35081029

RESUMO

Guided by the free-energy principle, generative adversarial networks (GAN)-based no-reference image quality assessment (NR-IQA) methods have improved the image quality prediction accuracy. However, the GAN cannot well handle the restoration task for the free-energy principle-guided NR-IQA methods, especially for the severely destroyed images, which results in that the quality reconstruction relationship between the distorted image and its restored image cannot be accurately built. To address this problem, a visual compensation restoration network (VCRNet)-based NR-IQA method is proposed, which uses a non-adversarial model to efficiently handle the distorted image restoration task. The proposed VCRNet consists of a visual restoration network and a quality estimation network. To accurately build the quality reconstruction relationship between the distorted image and its restored image, a visual compensation module, an optimized asymmetric residual block, and an error map-based mixed loss function, are proposed for increasing the restoration capability of the visual restoration network. For further addressing the NR-IQA problem of severely destroyed images, the multi-level restoration features which are obtained from the visual restoration network are used for the image quality estimation. To prove the effectiveness of the proposed VCRNet, seven representative IQA databases are used, and experimental results show that the proposed VCRNet achieves the state-of-the-art image quality prediction accuracy. The implementation of the proposed VCRNet has been released at https://github.com/NUIST-Videocoding/VCRNet.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...