Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 60
Filter
Add more filters










Publication year range
1.
Sensors (Basel) ; 23(9)2023 Apr 30.
Article in English | MEDLINE | ID: mdl-37177632

ABSTRACT

Stochastic resonance (SR), as a type of noise-assisted signal processing method, has been widely applied in weak signal detection and mechanical weak fault diagnosis. In order to further improve the weak signal detection performance of SR-based approaches and realize high-performance weak fault diagnosis, a global parameter optimization (GPO) model of a cascaded SR system is proposed in this work. The cascaded SR systems, which involve multiple multi-parameter-adjusting SR systems with both bistable and tri-stable potential functions, are first introduced. The fixed-parameter optimization (FPO) model and the GPO models of the cascaded systems to achieve optimal SR outputs are proposed based on the particle swarm optimization (PSO) algorithm. Simulated results show that the GPO model is capable of achieving a better SR output compared to the FPO model with rather good robustness and stability in detecting low signal-to-noise ratio (SNR) weak signals, and the tri-stable cascaded SR system has a better weak signal detection performance compared to the bistable cascaded SR system. Furthermore, the weak fault diagnosis approach based on the GPO model of the tri-stable cascaded system is proposed, and two rolling bearing weak fault diagnosis experiments are performed, thus verifying the effectiveness of the proposed approach in high-performance adaptive weak fault diagnosis.

2.
IEEE Trans Image Process ; 32: 1966-1977, 2023.
Article in English | MEDLINE | ID: mdl-37030695

ABSTRACT

Most facial landmark detection methods predict landmarks by mapping the input facial appearance features to landmark heatmaps and have achieved promising results. However, when the face image is suffering from large poses, heavy occlusions and complicated illuminations, they cannot learn discriminative feature representations and effective facial shape constraints, nor can they accurately predict the value of each element in the landmark heatmap, limiting their detection accuracy. To address this problem, we propose a novel Reference Heatmap Transformer (RHT) by introducing reference heatmap information for more precise facial landmark detection. The proposed RHT consists of a Soft Transformation Module (STM) and a Hard Transformation Module (HTM), which can cooperate with each other to encourage the accurate transformation of the reference heatmap information and facial shape constraints. Then, a Multi-Scale Feature Fusion Module (MSFFM) is proposed to fuse the transformed heatmap features and the semantic features learned from the original face images to enhance feature representations for producing more accurate target heatmaps. To the best of our knowledge, this is the first study to explore how to enhance facial landmark detection by transforming the reference heatmap information. The experimental results from challenging benchmark datasets demonstrate that our proposed method outperforms the state-of-the-art methods in the literature.

3.
IEEE Trans Image Process ; 32: 2017-2032, 2023.
Article in English | MEDLINE | ID: mdl-37018080

ABSTRACT

As a branch of transfer learning, domain adaptation leverages useful knowledge from a source domain to a target domain for solving target tasks. Most of the existing domain adaptation methods focus on how to diminish the conditional distribution shift and learn invariant features between different domains. However, two important factors are overlooked by most existing methods: 1) the transferred features should be not only domain invariant but also discriminative and correlated, and 2) negative transfer should be avoided as much as possible for the target tasks. To fully consider these factors in domain adaptation, we propose a guided discrimination and correlation subspace learning (GDCSL) method for cross-domain image classification. GDCSL considers the domain-invariant, category-discriminative, and correlation learning of data. Specifically, GDCSL introduces the discriminative information associated with the source and target data by minimizing the intraclass scatter and maximizing the interclass distance. By designing a new correlation term, GDCSL extracts the most correlated features from the source and target domains for image classification. The global structure of the data can be preserved in GDCSL because the target samples are represented by the source samples. To avoid negative transfer issues, we use a sample reweighting method to detect target samples with different confidence levels. A semi-supervised extension of GDCSL (Semi-GDCSL) is also proposed, and a novel label selection scheme is introduced to ensure the correction of the target pseudo-labels. Comprehensive and extensive experiments are conducted on several cross-domain data benchmarks. The experimental results verify the effectiveness of the proposed methods over state-of-the-art domain adaptation methods.

4.
Article in English | MEDLINE | ID: mdl-37027557

ABSTRACT

Graph-based clustering approaches, especially the family of spectral clustering, have been widely used in machine learning areas. The alternatives usually engage a similarity matrix that is constructed in advance or learned from a probabilistic perspective. However, unreasonable similarity matrix construction inevitably leads to performance degradation, and the sum-to-one probability constraints may make the approaches sensitive to noisy scenarios. To address these issues, the notion of typicality-aware adaptive similarity matrix learning is presented in this study. The typicality (possibility) rather than the probability of each sample being a neighbor of other samples is measured and adaptively learned. By introducing a robust balance term, the similarity between any pairs of samples is only related to the distance between them, yet it is not affected by other samples. Therefore, the impact caused by the noisy data or outliers can be alleviated, and meanwhile, the neighborhood structures can be well captured according to the joint distance between samples and their spectral embeddings. Moreover, the generated similarity matrix has block diagonal properties that are beneficial to correct clustering. Interestingly, the results optimized by the typicality-aware adaptive similarity matrix learning share the common essence with the Gaussian kernel function, and the latter can be directly derived from the former. Extensive experiments on synthetic and well-known benchmark datasets demonstrate the superiority of the proposed idea when comparing with some state-of-the-art methods.

5.
IEEE Trans Cybern ; 53(10): 6700-6713, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37018685

ABSTRACT

High-dimensional small sample size data, which may lead to singularity in computation, are becoming increasingly common in the field of pattern recognition. Moreover, it is still an open problem how to extract the most suitable low-dimensional features for the support vector machine (SVM) and simultaneously avoid singularity so as to enhance the SVM's performance. To address these problems, this article designs a novel framework that integrates the discriminative feature extraction and sparse feature selection into the support vector framework to make full use of the classifiers' characteristics to find the optimal/maximal classification margin. As such, the extracted low-dimensional features from high-dimensional data are more suitable for SVM to obtain good performance. Thus, a novel algorithm, called the maximal margin SVM (MSVM), is proposed to achieve this goal. An alternatively iterative learning strategy is adopted in MSVM to learn the optimal discriminative sparse subspace and the corresponding support vectors. The mechanism and the essence of the designed MSVM are revealed. The computational complexity and convergence are also analyzed and validated. Experimental results on some well-known databases (including breastmnist, pneumoniamnist, colon-cancer, etc.) show the great potential of MSVM against classical discriminant analysis methods and SVM-related methods, and the codes can be available on https://www.scholat.com/laizhihui.

6.
Sensors (Basel) ; 23(8)2023 Apr 10.
Article in English | MEDLINE | ID: mdl-37112201

ABSTRACT

Although stochastic resonance (SR) has been widely used to enhance weak fault signatures in machinery and has obtained remarkable achievements in engineering application, the parameter optimization of the existing SR-based methods requires the quantification indicators dependent on prior knowledge of the defects to be detected; for example, the widely used signal-to-noise ratio easily results in a false SR and decreases the detection performance of SR further. These indicators dependent on prior knowledge would not be suitable for real-world fault diagnosis of machinery where their structure parameters are unknown or are not able to be obtained. Therefore, it is necessary for us to design a type of SR method with parameter estimation, and such a method can estimate these parameters of SR adaptively by virtue of the signals to be processed or detected in place of the prior knowledge of the machinery. In this method, the triggered SR condition in second-order nonlinear systems and the synergic relationship among weak periodic signals, background noise and nonlinear systems can be considered to decide parameter estimation for enhancing unknown weak fault characteristics of machinery. Bearing fault experiments were performed to demonstrate the feasibility of the proposed method. The experimental results indicate that the proposed method is able to enhance weak fault characteristics and diagnose weak compound faults of bearings at an early stage without prior knowledge and any quantification indicators, and it presents the same detection performance as the SR methods based on prior knowledge. Furthermore, the proposed method is more simple and less time-consuming than other SR methods based on prior knowledge where a large number of parameters need to be optimized. Moreover, the proposed method is superior to the fast kurtogram method for early fault detection of bearings.

7.
Sensors (Basel) ; 23(5)2023 Mar 01.
Article in English | MEDLINE | ID: mdl-36904918

ABSTRACT

In the field of the muscle-computer interface, the most challenging task is extracting patterns from complex surface electromyography (sEMG) signals to improve the performance of myoelectric pattern recognition. To address this problem, a two-stage architecture, consisting of Gramian angular field (GAF)-based 2D representation and convolutional neural network (CNN)-based classification (GAF-CNN), is proposed. To explore discriminant channel features from sEMG signals, sEMG-GAF transformation is proposed for time sequence signal representation and feature modeling, in which the instantaneous values of multichannel sEMG signals are encoded in image form. A deep CNN model is introduced to extract high-level semantic features lying in image-form-based time sequence signals concerning instantaneous values for image classification. An insight analysis explains the rationale behind the advantages of the proposed method. Extensive experiments are conducted on benchmark publicly available sEMG datasets, i.e., NinaPro and CagpMyo, whose experimental results validate that the proposed GAF-CNN method is comparable to the state-of-the-art methods, as reported by previous work incorporating CNN models.


Subject(s)
Muscles , Neural Networks, Computer , Electromyography/methods , Benchmarking , Algorithms
8.
Neural Netw ; 161: 39-54, 2023 Apr.
Article in English | MEDLINE | ID: mdl-36735999

ABSTRACT

Spatial boundary effect can significantly reduce the performance of a learned discriminative correlation filter (DCF) model. A commonly used method to relieve this effect is to extract appearance features from a wider region of a target. However, this way would introduce unexpected features from background pixels and noises, which will lead to a decrease of the filter's discrimination power. To address this shortcoming, this paper proposes an innovative method called enhanced robust spatial feature selection and correlation filter Learning (EFSCF), which performs jointly sparse feature learning to handle boundary effects effectively while suppressing the influence of background pixels and noises. Unlike the ℓ2-norm-based tracking approaches that are prone to non-Gaussian noises, the proposed method imposes the ℓ2,1-norm on the loss term to enhance the robustness against the training outliers. To enhance the discrimination further, a jointly sparse feature selection scheme based on the ℓ2,1 -norm is designed to regularize the filter in rows and columns simultaneously. To the best of the authors' knowledge, this has been the first work exploring the structural sparsity in rows and columns of a learned filter simultaneously. The proposed model can be efficiently solved by an alternating direction multiplier method. The proposed EFSCF is verified by experiments on four challenging unmanned aerial vehicle datasets under severe noise and appearance changes, and the results show that the proposed method can achieve better tracking performance than the state-of-the-art trackers.


Subject(s)
Knowledge , Learning
9.
IEEE Trans Cybern ; 53(6): 3546-3560, 2023 Jun.
Article in English | MEDLINE | ID: mdl-34910655

ABSTRACT

Current fully supervised facial landmark detection methods have progressed rapidly and achieved remarkable performance. However, they still suffer when coping with faces under large poses and heavy occlusions for inaccurate facial shape constraints and insufficient labeled training samples. In this article, we propose a semisupervised framework, that is, a self-calibrated pose attention network (SCPAN) to achieve more robust and precise facial landmark detection in challenging scenarios. To be specific, a boundary-aware landmark intensity (BALI) field is proposed to model more effective facial shape constraints by fusing boundary and landmark intensity field information. Moreover, a self-calibrated pose attention (SCPA) model is designed to provide a self-learned objective function that enforces intermediate supervision without label information by introducing a self-calibrated mechanism and a pose attention mask. We show that by integrating the BALI fields and SCPA model into a novel SCPAN, more facial prior knowledge can be learned and the detection accuracy and robustness of our method for faces with large poses and heavy occlusions have been improved. The experimental results obtained for challenging benchmark datasets demonstrate that our approach outperforms state-of-the-art methods in the literature.


Subject(s)
Algorithms , Biometric Identification , Biometric Identification/methods
10.
IEEE Trans Cybern ; 53(8): 5135-5150, 2023 Aug.
Article in English | MEDLINE | ID: mdl-35666785

ABSTRACT

Support vector machine (SVM), as a supervised learning method, has different kinds of varieties with significant performance. In recent years, more research focused on nonparallel SVM, where twin SVM (TWSVM) is the typical one. In order to reduce the influence of outliers, more robust distance measurements are considered in these methods, but the discriminability of the models is neglected. In this article, we propose robust manifold twin bounded SVM (RMTBSVM), which considers both robustness and discriminability. Specifically, a novel norm, that is, capped L1 -norm, is used as the distance metric for robustness, and a robust manifold regularization is added to further improve the robustness and classification performance. In addition, we also use the kernel method to extend the proposed RMTBSVM for nonlinear classification. We introduce the optimization problems of the proposed model. Subsequently, effective algorithms for both linear and nonlinear cases are proposed and proved to be convergent. Moreover, the experiments are conducted to verify the effectiveness of our model. Compared with other methods under the SVM framework, the proposed RMTBSVM shows better classification accuracy and robustness.

11.
IEEE Trans Image Process ; 31: 7048-7062, 2022.
Article in English | MEDLINE | ID: mdl-36346858

ABSTRACT

As a multivariate data analysis tool, canonical correlation analysis (CCA) has been widely used in computer vision and pattern recognition. However, CCA uses Euclidean distance as a metric, which is sensitive to noise or outliers in the data. Furthermore, CCA demands that the two training sets must have the same number of training samples, which limits the performance of CCA-based methods. To overcome these limitations of CCA, two novel canonical correlation learning methods based on low-rank learning are proposed in this paper for image representation, named robust canonical correlation analysis (robust-CCA) and low-rank representation canonical correlation analysis (LRR-CCA). By introducing two regular matrices, the training sample numbers of the two training datasets can be set as any values without any limitation in the two proposed methods. Specifically, robust-CCA uses low-rank learning to remove the noise in the data and extracts the maximization correlation features from the two learned clean data matrices. The nuclear norm and L1 -norm are used as constraints for the learned clean matrices and noise matrices, respectively. LRR-CCA introduces low-rank representation into CCA to ensure that the correlative features can be obtained in low-rank representation. To verify the performance of the proposed methods, five publicly image databases are used to conduct extensive experiments. The experimental results demonstrate the proposed methods outperform state-of-the-art CCA-based and low-rank learning methods.

12.
Sensors (Basel) ; 22(22)2022 Nov 11.
Article in English | MEDLINE | ID: mdl-36433327

ABSTRACT

As a powerful feature extraction tool, a convolutional neural network (CNN) has strong adaptability for big data applications such as bearing fault diagnosis, whereas the classification performance is limited when the quality of raw signals is poor. In this paper, stochastic resonance (SR), which provides an advanced feature enhancement approach for weak signals with strong background noise, is introduced as a data pre-processing method for the CNN to improve its classification performance. First, a multiparameter adjusting bistable Duffing system that can achieve SR under large-parameter weak signals is introduced. A hybrid optimization algorithm (HOA) combining the genetic algorithm (GA) and the simulated annealing (SA) is proposed to adaptively obtain the optimized parameters and output signal-to-noise ratio (SNR) of the Duffing system. Therefore, the data optimization based on the multiparameter-adjusting SR of Duffing system can be realized. An SR-based mapping method is further proposed to convert the outputs of the Duffing system into grey images, which can be further processed by a normal CNN with batch normalization (BN) layers and dropout layers. After verifying the feasibility of the HOA in multiparameter optimization of the Duffing system, the bearing fault data set from the CWRU bearing data center was processed by the proposed fault enhancement classification and identification method. The research showed that the weak features of the bearing signals could be enhanced significantly through the adaptive multiparameter optimization of SR, and classification accuracies for 10 categories of bearing signals could achieve 100% and those for 20 categories could achieve more than 96.9%, which is better than other methods. The influences of the population number on the classification accuracies and calculation time were further studied, and the feature map and network visualization are presented. It was demonstrated that the proposed method can realize high-performance fault enhancement classification and identification.


Subject(s)
Algorithms , Neural Networks, Computer , Vibration
13.
IEEE Trans Image Process ; 31: 5303-5316, 2022.
Article in English | MEDLINE | ID: mdl-35914043

ABSTRACT

Domain adaptation leverages rich knowledge from a related source domain so that it can be used to perform tasks in a target domain. For more knowledge to be obtained under relaxed conditions, domain adaptation methods have been widely used in pattern recognition and image classification. However, most of the existing domain adaptation methods only consider how to minimize different distributions of the source and target domains, which neglects what should be transferred for a specific task and suffers negative transfer by distribution outliers. To address these problems, in this paper, we propose a novel domain adaptation method called weighted correlation embedding learning (WCEL) for image classification. In the WCEL approach, we seamlessly integrated correlation learning, graph embedding, and sample reweighting into a unified learning model. Specifically, we extracted the maximum correlated features from the source and target domains for image classification tasks. In addition, two graphs were designed to preserve the discriminant information from interclass samples and neighborhood relations in intraclass samples. Furthermore, to prevent the negative transfer problem, we developed an efficient sample reweighting strategy to predict the target with different confidence levels. To verify the performance of the proposed method in image classification, extensive experiments were conducted with several benchmark databases, verifying the superiority of the WCEL method over other state-of-the-art domain adaptation algorithms.

14.
Methods ; 202: 70-77, 2022 06.
Article in English | MEDLINE | ID: mdl-33992772

ABSTRACT

With the advance of deep learning technology, convolutional neural network (CNN) has been wildly used and achieved the state-of-the-art performances in the area of medical image classification. However, most existing medical image classification methods conduct their experiments on only one public dataset. When applying a well-trained model to a different dataset selected from different sources, the model usually shows large performance degradation and needs to be fine-tuned before it can be applied to the new dataset. The goal of this work is trying to solve the cross-domain image classification problem without using data from target domain. In this work, we designed a self-supervised plug-and-play feature-standardization-block (FSB) which consisting of image normalization (INB), contrast enhancement (CEB) and boundary detection blocks (BDB), to extract cross-domain robust feature maps for deep learning framework, and applied the network for chest x-ray-based lung diseases classification. Three classic deep networks, i.e. VGG, Xception and DenseNet and four chest x-ray lung diseases datasets were employed for evaluating the performance. The experimental result showed that when employing feature-standardization-block, all three networks showed better domain adaption performance. The image normalization, contrast enhancement and boundary detection blocks achieved in average 2%, 2% and 5% accuracy improvement, respectively. By combining all three blocks, feature-standardization-block achieved in average 6% accuracy improvement.


Subject(s)
Deep Learning , Lung Diseases , Humans , Lung , Lung Diseases/diagnostic imaging , Neural Networks, Computer , Reference Standards
15.
IEEE Trans Neural Netw Learn Syst ; 33(1): 185-199, 2022 Jan.
Article in English | MEDLINE | ID: mdl-33147149

ABSTRACT

Sparse discriminative projection learning has attracted much attention due to its good performance in recognition tasks. In this article, a framework called generalized embedding regression (GER) is proposed, which can simultaneously perform low-dimensional embedding and sparse projection learning in a joint objective function with a generalized orthogonal constraint. Moreover, the label information is integrated into the model to preserve the global structure of data, and a rank constraint is imposed on the regression matrix to explore the underlying correlation structure of classes. Theoretical analysis shows that GER can obtain the same or approximate solution as some related methods with special settings. By utilizing this framework as a general platform, we design a novel supervised feature extraction approach called jointly sparse embedding regression (JSER). In JSER, we construct an intrinsic graph to characterize the intraclass similarity and a penalty graph to indicate the interclass separability. Then, the penalty graph Laplacian is used as the constraint matrix in the generalized orthogonal constraint to deal with interclass marginal points. Moreover, the L2,1 -norm is imposed on the regression terms for robustness to outliers and data's variations and the regularization term for jointly sparse projection learning, leading to interesting semantic interpretability. An effective iterative algorithm is elaborately designed to solve the optimization problem of JSER. Theoretically, we prove that the subproblem of JSER is essentially an unbalanced Procrustes problem and can be solved iteratively. The convergence of the designed algorithm is also proved. Experimental results on six well-known data sets indicate the competitive performance and latent properties of JSER.

16.
IEEE Trans Neural Netw Learn Syst ; 33(5): 2181-2194, 2022 05.
Article in English | MEDLINE | ID: mdl-33417567

ABSTRACT

Recently, heatmap regression has been widely explored in facial landmark detection and obtained remarkable performance. However, most of the existing heatmap regression-based facial landmark detection methods neglect to explore the high-order feature correlations, which is very important to learn more representative features and enhance shape constraints. Moreover, no explicit global shape constraints have been added to the final predicted landmarks, which leads to a reduction in accuracy. To address these issues, in this article, we propose a multiorder multiconstraint deep network (MMDN) for more powerful feature correlations and shape constraints' learning. Especially, an implicit multiorder correlating geometry-aware (IMCG) model is proposed to introduce the multiorder spatial correlations and multiorder channel correlations for more discriminative representations. Furthermore, an explicit probability-based boundary-adaptive regression (EPBR) method is developed to enhance the global shape constraints and further search the semantically consistent landmarks in the predicted boundary for robust facial landmark detection. It is interesting to show that the proposed MMDN can generate more accurate boundary-adaptive landmark heatmaps and effectively enhance shape constraints to the predicted landmarks for faces with large pose variations and heavy occlusions. Experimental results on challenging benchmark data sets demonstrate the superiority of our MMDN over state-of-the-art facial landmark detection methods.


Subject(s)
Face , Neural Networks, Computer , Benchmarking , Learning , Regression Analysis
17.
Neural Netw ; 145: 209-220, 2022 Jan.
Article in English | MEDLINE | ID: mdl-34768091

ABSTRACT

Although significant progress has been made in synthesizing high-quality and visually realistic face images by unconditional Generative Adversarial Networks (GANs), there is still a lack of control over the generation process in order to achieve semantic face editing. In this paper, we propose a novel learning framework, called GuidedStyle, to achieve semantic face editing on pretrained StyleGAN by guiding the image generation process with a knowledge network. Furthermore, we allow an attention mechanism in StyleGAN generator to adaptively select a single layer for style manipulation. As a result, our method is able to perform disentangled and controllable edits along various attributes, including smiling, eyeglasses, gender, mustache, hair color and attractive. Both qualitative and quantitative results demonstrate the superiority of our method over other competing methods for semantic face editing. Moreover, we show that our model can be also applied to different types of real and artistic face editing, demonstrating strong generalization ability.


Subject(s)
Neural Networks, Computer , Semantics , Image Processing, Computer-Assisted
18.
IEEE Trans Image Process ; 30: 7776-7789, 2021.
Article in English | MEDLINE | ID: mdl-34495830

ABSTRACT

Person Re-identification (ReID) aims to retrieve the pedestrian with the same identity across different views. Existing studies mainly focus on improving accuracy, while ignoring their efficiency. Recently, several hash based methods have been proposed. Despite their improvement in efficiency, there still exists an unacceptable gap in accuracy between these methods and real-valued ones. Besides, few attempts have been made to simultaneously explicitly reduce redundancy and improve discrimination of hash codes, especially for short ones. Integrating Mutual learning may be a possible solution to reach this goal. However, it fails to utilize the complementary effect of teacher and student models. Additionally, it will degrade the performance of teacher models by treating two models equally. To address these issues, we propose a salience-guided iterative asymmetric mutual hashing (SIAMH) to achieve high-quality hash code generation and fast feature extraction. Specifically, a salience-guided self-distillation branch (SSB) is proposed to enable SIAMH to generate hash codes based on salience regions, thus explicitly reducing the redundancy between codes. Moreover, a novel iterative asymmetric mutual training strategy (IAMT) is proposed to alleviate drawbacks of common mutual learning, which can continuously refine the discriminative regions for SSB and extract regularized dark knowledge for two models as well. Extensive experiment results on five widely used datasets demonstrate the superiority of the proposed method in efficiency and accuracy when compared with existing state-of-the-art hashing and real-valued approaches. The code is released at https://github.com/Vill-Lab/SIAMH.


Subject(s)
Algorithms , Pedestrians , Humans
19.
IEEE Trans Image Process ; 30: 7143-7155, 2021.
Article in English | MEDLINE | ID: mdl-34370664

ABSTRACT

Facial action units (AUs) analysis plays an important role in facial expression recognition (FER). Existing deep spectral convolutional networks (DSCNs) have made encouraging performance for FER based on a set of facial local regions and a predefined graph structure. However, these regions do not have close relationships to AUs, and DSCNs cannot model the dynamic spatial dependencies of these regions for estimating different facial expressions. To tackle these issues, we propose a novel double dynamic relationships graph convolutional network (DDRGCN) to learn the strength of the edges in the facial graph by a trainable weighted adjacency matrix. We construct facial graph data by 20 regions of interest (ROIs) guided by different facial AUs. Furthermore, we devise an efficient graph convolutional network in which the inherent dependencies of vertices in the facial graph can be learned automatically during network training. Notably, the proposed model only has 110K parameters and 0.48MB model size, which is significantly less than most existing methods. Experiments on four widely-used FER datasets demonstrate that the proposed dynamic relationships graph network achieves superior results compared to existing light-weight networks, not just in terms of accuracy but also model size and speed.

20.
IEEE Trans Image Process ; 30: 7074-7089, 2021.
Article in English | MEDLINE | ID: mdl-34351858

ABSTRACT

Though widely used in image classification, convolutional neural networks (CNNs) are prone to noise interruptions, i.e. the CNN output can be drastically changed by small image noise. To improve the noise robustness, we try to integrate CNNs with wavelet by replacing the common down-sampling (max-pooling, strided-convolution, and average pooling) with discrete wavelet transform (DWT). We firstly propose general DWT and inverse DWT (IDWT) layers applicable to various orthogonal and biorthogonal discrete wavelets like Haar, Daubechies, and Cohen, etc., and then design wavelet integrated CNNs (WaveCNets) by integrating DWT into the commonly used CNNs (VGG, ResNets, and DenseNet). During the down-sampling, WaveCNets apply DWT to decompose the feature maps into the low-frequency and high-frequency components. Containing the main information including the basic object structures, the low-frequency component is transmitted into the following layers to generate robust high-level features. The high-frequency components are dropped to remove most of the data noises. The experimental results show that WaveCNets achieve higher accuracy on ImageNet than various vanilla CNNs. We have also tested the performance of WaveCNets on the noisy version of ImageNet, ImageNet-C and six adversarial attacks, the results suggest that the proposed DWT/IDWT layers could provide better noise-robustness and adversarial robustness. When applying WaveCNets as backbones, the performance of object detectors (i.e., faster R-CNN and RetinaNet) on COCO detection dataset are consistently improved. We believe that suppression of aliasing effect, i.e. separation of low frequency and high frequency information, is the main advantages of our approach. The code of our DWT/IDWT layer and different WaveCNets are available at https://github.com/CVI-SZU/WaveCNet.


Subject(s)
Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Wavelet Analysis , Algorithms
SELECTION OF CITATIONS
SEARCH DETAIL
...