Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 34
Filter
1.
J Acoust Soc Am ; 155(1): 78-93, 2024 Jan 01.
Article in English | MEDLINE | ID: mdl-38174966

ABSTRACT

The identification of nonlinear chirp signals has attracted notable attention in the recent literature, including estimators such as the variational mode decomposition and the nonlinear chirp mode estimator. However, most presented methods fail to process signals with close frequency intervals or depend on user-determined parameters that are often non-trivial to select optimally. In this work, we propose a fully adaptive method, termed the adaptive nonlinear chirp mode estimation. The method decomposes a combined nonlinear chirp signal into its principal modes, accurately representing each mode's time-frequency representation simultaneously. Exploiting the sparsity of the instantaneous amplitudes, the proposed method can produce estimates that are smooth in the sense of being piecewise linear. Furthermore, we analyze the decomposition problem from a Bayesian perspective, using hierarchical Laplace priors to form an efficient implementation, allowing for a fully automatic parameter selection. Numerical simulations and experimental data analysis show the effectiveness and advantages of the proposed method. Notably, the algorithm is found to yield reliable estimates even when encountering signals with crossed modes. The method's practical potential is illustrated on a whale whistle signal.

2.
J Acoust Soc Am ; 152(4): 2187, 2022 Oct.
Article in English | MEDLINE | ID: mdl-36319234

ABSTRACT

Nonlinear group delay signals with frequency-varying characteristics are common in a wide variety of fields, for instance, structural health monitoring and fault diagnosis. For such applications, the signal is composed of multiple modes, where each mode may overlap in the frequency-domain. The resulting decomposition and forming of time-frequency representations of the nonlinear group delay modes is a challenging task. In this study, the nonlinear group delay signal is modelled in the frequency-domain. Exploiting the sparsity of the signal, we present the nonlinear group delay mode estimation technique, which forms the demodulation dictionary from the group delay. This method can deal with crossed modes and transient impulse signals. Furthermore, an augmented alternating direction multiplier method is introduced to form an efficient implementation. Numerical simulations and experimental data analysis show the effectiveness and advantages of the proposed method. In addition, the included analysis of Lamb waves as well as of a bearing signal show the method's potential for structural health monitoring and fault diagnosis.

3.
NMR Biomed ; 32(5): e4067, 2019 05.
Article in English | MEDLINE | ID: mdl-30811722

ABSTRACT

Quantitative susceptibility mapping (QSM) is a meaningful MRI technique owing to its unique relation to actual physical tissue magnetic properties. The reconstruction of QSM is usually decomposed into three sub-problems, which are solved independently. However, this decomposition does not conform to the causes of the problems, and may cause discontinuity of parameters and error accumulation. In this paper, a fast reconstruction method named fast TFI based on total field inversion was proposed. It can accelerate the total field inversion by using a specially selected preconditioner and advanced solution of the weighted L0 regularization. Due to the employment of an effective model, the proposed method can efficiently reconstruct the QSM of brains with lesions, where other methods may encounter problems. Experimental results from simulation and in vivo data verified that the new method has better reconstruction accuracy, faster convergence ability and excellent robustness, which may promote clinical application of QSM.


Subject(s)
Algorithms , Magnetic Resonance Imaging , Brain/diagnostic imaging , Brain/pathology , Gadolinium/chemistry , Humans , Image Processing, Computer-Assisted , Linear Models , Phantoms, Imaging
4.
Magn Reson Med ; 80(5): 2202-2214, 2018 11.
Article in English | MEDLINE | ID: mdl-29687915

ABSTRACT

PURPOSE: An end-to-end deep convolutional neural network (CNN) based on deep residual network (ResNet) was proposed to efficiently reconstruct reliable T2 mapping from single-shot overlapping-echo detachment (OLED) planar imaging. METHODS: The training dataset was obtained from simulations that were carried out on SPROM (Simulation with PRoduct Operator Matrix) software developed by our group. The relationship between the original OLED image containing two echo signals and the corresponding T2 mapping was learned by ResNet training. After the ResNet was trained, it was applied to reconstruct the T2 mapping from simulation and in vivo human brain data. RESULTS: Although the ResNet was trained entirely on simulated data, the trained network was generalized well to real human brain data. The results from simulation and in vivo human brain experiments show that the proposed method significantly outperforms the echo-detachment-based method. Reliable T2 mapping with higher accuracy is achieved within 30 ms after the network has been trained, while the echo-detachment-based OLED reconstruction method took approximately 2 min. CONCLUSION: The proposed method will facilitate real-time dynamic and quantitative MR imaging via OLED sequence, and deep convolutional neural network has the potential to reconstruct maps from complex MRI sequences efficiently.


Subject(s)
Deep Learning , Echo-Planar Imaging/methods , Image Processing, Computer-Assisted/methods , Adult , Algorithms , Brain/diagnostic imaging , Computer Simulation , Humans , Phantoms, Imaging
5.
IEEE Trans Image Process ; 32: 2493-2507, 2023.
Article in English | MEDLINE | ID: mdl-37099471

ABSTRACT

Self-supervised video-based action recognition is a challenging task, which needs to extract the principal information characterizing the action from content-diversified videos over large unlabeled datasets. However, most existing methods choose to exploit the natural spatio-temporal properties of video to obtain effective action representations from a visual perspective, while ignoring the exploration of the semantic that is closer to human cognition. For that, a self-supervised Video-based Action Recognition method with Disturbances called VARD, which extracts the principal information of the action in terms of the visual and semantic, is proposed. Specifically, according to cognitive neuroscience research, the recognition ability of humans is activated by visual and semantic attributes. An intuitive impression is that minor changes of the actor or scene in video do not affect one person's recognition of the action. On the other hand, different humans always make consistent opinions when they recognize the same action video. In other words, for an action video, the necessary information that remains constant despite the disturbances in the visual video or the semantic encoding process is sufficient to represent the action. Therefore, to learn such information, we construct a positive clip/embedding for each action video. Compared to the original video clip/embedding, the positive clip/embedding is disturbed visually/semantically by Video Disturbance and Embedding Disturbance. Our objective is to pull the positive closer to the original clip/embedding in the latent space. In this way, the network is driven to focus on the principal information of the action while the impact of sophisticated details and inconsequential variations is weakened. It is worthwhile to mention that the proposed VARD does not require optical flow, negative samples, and pretext tasks. Extensive experiments conducted on the UCF101 and HMDB51 datasets demonstrate that the proposed VARD effectively improves the strong baseline and outperforms multiple classical and advanced self-supervised action recognition methods.


Subject(s)
Algorithms , Pattern Recognition, Automated , Humans , Pattern Recognition, Automated/methods , Semantics
6.
IEEE Trans Neural Netw Learn Syst ; 34(11): 9029-9039, 2023 Nov.
Article in English | MEDLINE | ID: mdl-35286266

ABSTRACT

Optimization algorithms are of great importance to efficiently and effectively train a deep neural network. However, the existing optimization algorithms show unsatisfactory convergence behavior, either slowly converging or not seeking to avoid bad local optima. Learning rate dropout (LRD) is a new gradient descent technique to motivate faster convergence and better generalization. LRD aids the optimizer to actively explore in the parameter space by randomly dropping some learning rates (to 0); at each iteration, only parameters whose learning rate is not 0 are updated. Since LRD reduces the number of parameters to be updated for each iteration, the convergence becomes easier. For parameters that are not updated, their gradients are accumulated (e.g., momentum) by the optimizer for the next update. Accumulating multiple gradients at fixed parameter positions gives the optimizer more energy to escape from the saddle point and bad local optima. Experiments show that LRD is surprisingly effective in accelerating training while preventing overfitting.

7.
Article in English | MEDLINE | ID: mdl-37027757

ABSTRACT

Faithful measurement of perceptual quality is of significant importance to various multimedia applications. By fully utilizing reference images, full-reference image quality assessment (FR-IQA) methods usually achieves better prediction performance. On the other hand, no-reference image quality assessment (NR-IQA), also known as blind image quality assessment (BIQA), which does not consider the reference image, makes it a challenging but important task. Previous NR-IQA methods have focused on spatial measures at the expense of information in the available frequency bands. In this paper, we present a multiscale deep blind image quality assessment method (BIQA, M.D.) with spatial optimal-scale filtering analysis. Motivated by the multi-channel behavior of the human visual system and contrast sensitivity function, we decompose an image into a number of spatial frequency bands by multiscale filtering and extract features for mapping an image to its subjective quality score by applying convolutional neural network. Experimental results show that BIQA, M.D. compares well with existing NR-IQA methods and generalizes well across datasets.

8.
IEEE Trans Med Imaging ; PP2023 Nov 28.
Article in English | MEDLINE | ID: mdl-38015692

ABSTRACT

The generation of synthetic data using physics-based modeling provides a solution to limited or lacking real-world training samples in deep learning methods for rapid quantitative magnetic resonance imaging (qMRI). However, synthetic data distribution differs from real-world data, especially under complex imaging conditions, resulting in gaps between domains and limited generalization performance in real scenarios. Recently, a single-shot qMRI method, multiple overlapping-echo detachment imaging (MOLED), was proposed, quantifying tissue transverse relaxation time (T2) in the order of milliseconds with the help of a trained network. Previous works leveraged a Bloch-based simulator to generate synthetic data for network training, which leaves the domain gap between synthetic and real-world scenarios and results in limited generalization. In this study, we proposed a T2 mapping method via MOLED from the perspective of domain adaptation, which obtained accurate mapping performance without real-label training and reduced the cost of sequence research at the same time. Experiments demonstrate that our method outshined in the restoration of MR anatomical structures.

9.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3677-3694, 2023 Mar.
Article in English | MEDLINE | ID: mdl-35648876

ABSTRACT

Domain Adaptive Object Detection (DAOD) focuses on improving the generalization ability of object detectors via knowledge transfer. Recent advances in DAOD strive to change the emphasis of the adaptation process from global to local in virtue of fine-grained feature alignment methods. However, both the global and local alignment approaches fail to capture the topological relations among different foreground objects as the explicit dependencies and interactions between and within domains are neglected. In this case, only seeking one-vs-one alignment does not necessarily ensure the precise knowledge transfer. Moreover, conventional alignment-based approaches may be vulnerable to catastrophic overfitting regarding those less transferable regions (e.g., backgrounds) due to the accumulation of inaccurate localization results in the target domain. To remedy these issues, we first formulate DAOD as an open-set domain adaptation problem, in which the foregrounds and backgrounds are seen as the "known classes" and "unknown class" respectively. Accordingly, we propose a new and general framework for DAOD, named Foreground-aware Graph-based Relational Reasoning (FGRR), which incorporates graph structures into the detection pipeline to explicitly model the intra- and inter-domain foreground object relations on both pixel and semantic spaces, thereby endowing the DAOD model with the capability of relational reasoning beyond the popular alignment-based paradigm. FGRR first identifies the foreground pixels and regions by searching reliable correspondence and cross-domain similarity regularization respectively. The inter-domain visual and semantic correlations are hierarchically modeled via bipartite graph structures, and the intra-domain relations are encoded via graph attention mechanisms. Through message-passing, each node aggregates semantic and contextual information from the same and opposite domain to substantially enhance its expressive power. Empirical results demonstrate that the proposed FGRR exceeds the state-of-the-art performance on four DAOD benchmarks.

10.
IEEE Trans Neural Netw Learn Syst ; 34(3): 1513-1523, 2023 Mar.
Article in English | MEDLINE | ID: mdl-34460396

ABSTRACT

The goal of hyperspectral image fusion (HIF) is to reconstruct high spatial resolution hyperspectral images (HR-HSI) via fusing low spatial resolution hyperspectral images (LR-HSI) and high spatial resolution multispectral images (HR-MSI) without loss of spatial and spectral information. Most existing HIF methods are designed based on the assumption that the observation models are known, which is unrealistic in many scenarios. To address this blind HIF problem, we propose a deep learning-based method that optimizes the observation model and fusion processes iteratively and alternatively during the reconstruction to enforce bidirectional data consistency, which leads to better spatial and spectral accuracy. However, general deep neural network inherently suffers from information loss, preventing us to achieve this bidirectional data consistency. To settle this problem, we enhance the blind HIF algorithm by making part of the deep neural network invertible via applying a slightly modified spectral normalization to the weights of the network. Furthermore, in order to reduce spatial distortion and feature redundancy, we introduce a Content-Aware ReAssembly of FEatures module and an SE-ResBlock model to our network. The former module helps to boost the fusion performance, while the latter make our model more compact. Experiments demonstrate that our model performs favorably against compared methods in terms of both nonblind HIF fusion and semiblind HIF fusion.

11.
Neural Netw ; 163: 354-366, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37099898

ABSTRACT

Federated Learning (FL) can learn a global model across decentralized data over different clients. However, it is susceptible to statistical heterogeneity of client-specific data. Clients focus on optimizing for their individual target distributions, which would yield divergence of the global model due to inconsistent data distributions. Moreover, federated learning approaches adhere to the scheme of collaboratively learning representations and classifiers, further exacerbating such inconsistency and resulting in imbalanced features and biased classifiers. Hence, in this paper, we propose an independent two-stage personalized FL framework, i.e., Fed-RepPer, to separate representation learning from classification in federated learning. First, the client-side feature representation models are learned using supervised contrastive loss, which enables local objectives consistently, i.e., learning robust representations on distinct data distributions. Local representation models are aggregated into the common global representation model. Then, in the second stage, personalization is studied by learning different classifiers for each client based on the global representation model. The proposed two-stage learning scheme is examined in lightweight edge computing that involves devices with constrained computation resources. Experiments on various datasets (CIFAR-10/100, CINIC-10) and heterogeneous data setups show that Fed-RepPer outperforms alternatives by utilizing flexibility and personalization on non-IID data.

12.
IEEE Trans Neural Netw Learn Syst ; 34(1): 134-143, 2023 Jan.
Article in English | MEDLINE | ID: mdl-34197327

ABSTRACT

Referring expression comprehension (REC) is an emerging research topic in computer vision, which refers to the detection of a target region in an image given a test description. Most existing REC methods follow a multistage pipeline, which is computationally expensive and greatly limits the applications of REC. In this article, we propose a one-stage model toward real-time REC, termed real-time global inference network (RealGIN). RealGIN addresses the issues of expression diversity and complexity of REC with two innovative designs: adaptive feature selection (AFS) and Global Attentive ReAsoNing (GARAN). Expression diversity concerns varying expression content, which includes information such as colors, attributes, locations, and fine-grained categories. To address this issue, AFS adaptively fuses features of different semantic levels to tackle the changes in expression content. In contrast, expression complexity concerns the complex relational conditions in expressions that are used to identify the referent. To this end, GARAN uses the textual feature as a pivot to collect expression-aware visual information from all regions and then diffuses this information back to each region, which provides sufficient context for modeling the relational conditions in expressions. On five benchmark datasets, i.e., RefCOCO, RefCOCO+, RefCOCOg, ReferIT, and Flickr30k, the proposed RealGIN outperforms most existing methods and achieves very competitive performances against the most advanced one, i.e., MAttNet. More importantly, under the same hardware, RealGIN can boost the processing speed by 10-20 times over the existing methods.

13.
IEEE Trans Neural Netw Learn Syst ; 33(11): 6802-6816, 2022 Nov.
Article in English | MEDLINE | ID: mdl-34081590

ABSTRACT

Deep learning-based methods have achieved notable progress in removing blocking artifacts caused by lossy JPEG compression on images. However, most deep learning-based methods handle this task by designing black-box network architectures to directly learn the relationships between the compressed images and their clean versions. These network architectures are always lack of sufficient interpretability, which limits their further improvements in deblocking performance. To address this issue, in this article, we propose a model-driven deep unfolding method for JPEG artifacts removal, with interpretable network structures. First, we build a maximum posterior (MAP) model for deblocking using convolutional dictionary learning and design an iterative optimization algorithm using proximal operators. Second, we unfold this iterative algorithm into a learnable deep network structure, where each module corresponds to a specific operation of the iterative algorithm. In this way, our network inherits the benefits of both the powerful model ability of data-driven deep learning method and the interpretability of traditional model-driven method. By training the proposed network in an end-to-end manner, all learnable modules can be automatically explored to well characterize the representations of both JPEG artifacts and image content. Experiments on synthetic and real-world datasets show that our method is able to generate competitive or even better deblocking results, compared with state-of-the-art methods both quantitatively and qualitatively.

14.
IEEE Trans Med Imaging ; 41(9): 2457-2468, 2022 09.
Article in English | MEDLINE | ID: mdl-35363612

ABSTRACT

Synthesizing a subject-specific pathology-free image from a pathological image is valuable for algorithm development and clinical practice. In recent years, several approaches based on the Generative Adversarial Network (GAN) have achieved promising results in pseudo-healthy synthesis. However, the discriminator (i.e., a classifier) in the GAN cannot accurately identify lesions and further hampers from generating admirable pseudo-healthy images. To address this problem, we present a new type of discriminator, the segmentor, to accurately locate the lesions and improve the visual quality of pseudo-healthy images. Then, we apply the generated images into medical image enhancement and utilize the enhanced results to cope with the low contrast problem existing in medical image segmentation. Furthermore, a reliable metric is proposed by utilizing two attributes of label noise to measure the health of synthetic images. Comprehensive experiments on the T2 modality of BraTS demonstrate that the proposed method substantially outperforms the state-of-the-art methods. The method achieves better performance than the existing methods with only 30% of the training data. The effectiveness of the proposed method is also demonstrated on the LiTS and the T1 modality of BraTS. The code and the pre-trained model of this study are publicly available at https://github.com/Au3C2/Generator-Versus-Segmentor.


Subject(s)
Image Processing, Computer-Assisted , Neural Networks, Computer , Algorithms , Image Processing, Computer-Assisted/methods
15.
Comput Biol Med ; 141: 105144, 2022 02.
Article in English | MEDLINE | ID: mdl-34971982

ABSTRACT

Medical imaging datasets usually exhibit domain shift due to the variations of scanner vendors, imaging protocols, etc. This raises the concern about the generalization capacity of machine learning models. Domain generalization (DG), which aims to learn a model from multiple source domains such that it can be directly generalized to unseen test domains, seems particularly promising to medical imaging community. To address DG, recent model-agnostic meta-learning (MAML) has been introduced, which transfers the knowledge from previous training tasks to facilitate the learning of novel testing tasks. However, in clinical practice, there are usually only a few annotated source domains available, which decreases the capacity of training task generation and thus increases the risk of overfitting to training tasks in the paradigm. In this paper, we propose a novel DG scheme of episodic training with task augmentation on medical imaging classification. Based on meta-learning, we develop the paradigm of episodic training to construct the knowledge transfer from episodic training-task simulation to the real testing task of DG. Motivated by the limited number of source domains in real-world medical deployment, we consider the unique task-level overfitting and we propose task augmentation to enhance the variety during training task generation to alleviate it. With the established learning framework, we further exploit a novel meta-objective to regularize the deep embedding of training domains. To validate the effectiveness of the proposed method, we perform experiments on histopathological images and abdominal CT images.


Subject(s)
Diagnostic Imaging , Machine Learning , Computer Simulation , Radiography
16.
Med Image Anal ; 81: 102528, 2022 10.
Article in English | MEDLINE | ID: mdl-35834896

ABSTRACT

Accurate computing, analysis and modeling of the ventricles and myocardium from medical images are important, especially in the diagnosis and treatment management for patients suffering from myocardial infarction (MI). Late gadolinium enhancement (LGE) cardiac magnetic resonance (CMR) provides an important protocol to visualize MI. However, compared with the other sequences LGE CMR images with gold standard labels are particularly limited. This paper presents the selective results from the Multi-Sequence Cardiac MR (MS-CMR) Segmentation challenge, in conjunction with MICCAI 2019. The challenge offered a data set of paired MS-CMR images, including auxiliary CMR sequences as well as LGE CMR, from 45 patients who underwent cardiomyopathy. It was aimed to develop new algorithms, as well as benchmark existing ones for LGE CMR segmentation focusing on myocardial wall of the left ventricle and blood cavity of the two ventricles. In addition, the paired MS-CMR images could enable algorithms to combine the complementary information from the other sequences for the ventricle segmentation of LGE CMR. Nine representative works were selected for evaluation and comparisons, among which three methods are unsupervised domain adaptation (UDA) methods and the other six are supervised. The results showed that the average performance of the nine methods was comparable to the inter-observer variations. Particularly, the top-ranking algorithms from both the supervised and UDA methods could generate reliable and robust segmentation results. The success of these methods was mainly attributed to the inclusion of the auxiliary sequences from the MS-CMR images, which provide important label information for the training of deep neural networks. The challenge continues as an ongoing resource, and the gold standard segmentation as well as the MS-CMR images of both the training and test data are available upon registration via its homepage (www.sdspeople.fudan.edu.cn/zhuangxiahai/0/mscmrseg/).


Subject(s)
Gadolinium , Myocardial Infarction , Benchmarking , Contrast Media , Heart , Humans , Magnetic Resonance Imaging/methods , Myocardial Infarction/diagnostic imaging , Myocardium/pathology
17.
IEEE Trans Neural Netw Learn Syst ; 32(5): 2090-2104, 2021 May.
Article in English | MEDLINE | ID: mdl-32484781

ABSTRACT

We introduce a new deep detail network architecture with grouped multiscale dilated convolutions to sharpen images contain multiband spectral information. Specifically, our end-to-end network directly fuses low-resolution multispectral and panchromatic inputs to produce high-resolution multispectral results, which is the same goal of the pansharpening in remote sensing. The proposed network architecture is designed by utilizing our domain knowledge and considering the two aims of the pansharpening: spectral and spatial preservations. For spectral preservation, the up-sampled multispectral images are directly added to the output for lossless spectral information propagation. For spatial preservation, we train the proposed network in the high-frequency domain instead of the commonly used image domain. Different from conventional network structures, we remove pooling and batch normalization layers to preserve spatial information and improve generalization to new satellites, respectively. To effectively and efficiently obtain multiscale contextual features at a fine-grained level, we propose a grouped multiscale dilated network structure to enlarge the receptive fields for each network layer. This structure allows the network to capture multiscale representations without increasing the parameter burden and network complexity. These representations are finally utilized to reconstruct the residual images which contain spatial details of PAN. Our trained network is able to generalize different satellite images without the need for parameter tuning. Moreover, our model is a general framework, which can be directly used for other kinds of multiband spectral image sharpening, e.g., hyperspectral image sharpening. Experiments show that our model performs favorably against compared methods in terms of both qualitative and quantitative qualities.

18.
Comput Biol Med ; 140: 105067, 2021 Nov 27.
Article in English | MEDLINE | ID: mdl-34920364

ABSTRACT

Despite impressive developments in deep convolutional neural networks for medical imaging, the paradigm of supervised learning requires numerous annotations in training to avoid overfitting. In clinical cases, massive semantic annotations are difficult to acquire where biomedical expert knowledge is required. Moreover, it is common when only a few annotated classes are available. In this study, we proposed a new approach to few-shot medical image segmentation, which enables a segmentation model to quickly generalize to an unseen class with few training images. We constructed a few-shot image segmentation mechanism using a deep convolutional network trained episodically. Motivated by the spatial consistency and regularity in medical images, we developed an efficient global correlation module to model the correlation between a support and query image and incorporate it into the deep network. We enhanced the discrimination ability of the deep embedding scheme to encourage clustering of feature domains belonging to the same class while keeping feature domains of different organs far apart. We experimented using anatomical abdomen images from both CT and MRI modalities.

19.
IEEE J Biomed Health Inform ; 25(4): 1163-1172, 2021 04.
Article in English | MEDLINE | ID: mdl-32881698

ABSTRACT

In recent years, deep learning methods have received more attention in epithelial-stroma (ES) classification tasks. Traditional deep learning methods assume that the training and test data have the same distribution, an assumption that is seldom satisfied in complex imaging procedures. Unsupervised domain adaptation (UDA) transfers knowledge from a labelled source domain to a completely unlabeled target domain, and is more suitable for ES classification tasks to avoid tedious annotation. However, existing UDA methods for this task ignore the semantic alignment across domains. In this paper, we propose a Curriculum Feature Alignment Network (CFAN) to gradually align discriminative features across domains through selecting effective samples from the target domain and minimizing intra-class differences. Specifically, we developed the Curriculum Transfer Strategy (CTS) and Adaptive Centroid Alignment (ACA) steps to train our model iteratively. We validated the method using three independent public ES datasets, and experimental results demonstrate that our method achieves better performance in ES classification compared with commonly used deep learning methods and existing deep domain adaptation methods.


Subject(s)
Connective Tissue , Semantics , Curriculum , Epithelium , Humans
20.
IEEE Trans Neural Netw Learn Syst ; 31(6): 1794-1807, 2020 Jun.
Article in English | MEDLINE | ID: mdl-31329133

ABSTRACT

Existing deep convolutional neural networks (CNNs) have found major success in image deraining, but at the expense of an enormous number of parameters. This limits their potential applications, e.g., in mobile devices. In this paper, we propose a lightweight pyramid networt (LPNet) for single-image deraining. Instead of designing a complex network structure, we use domain-specific knowledge to simplify the learning process. In particular, we find that by introducing the mature Gaussian-Laplacian image pyramid decomposition technology to the neural network, the learning problem at each pyramid level is greatly simplified and can be handled by a relatively shallow network with few parameters. We adopt recursive and residual network structures to build the proposed LPNet, which has less than 8K parameters while still achieving the state-of-the-art performance on rain removal. We also discuss the potential value of LPNet for other low- and high-level vision tasks.

SELECTION OF CITATIONS
SEARCH DETAIL