Search | VHL Regional Portal

1.

Designing Universally-Approximating Deep Neural Networks: A First-Order Optimization Approach.

Wu, Zhoutong; Xiao, Mingqing; Fang, Cong; Lin, Zhouchen.

IEEE Trans Pattern Anal Mach Intell ; PP2024 Mar 25.

Article in English | MEDLINE | ID: mdl-38526901

ABSTRACT

Universal approximation capability, also referred to as universality, is an important property of deep neural networks, endowing them with the potency to accurately represent the underlying target function in learning tasks. In practice, the architecture of deep neural networks largely influences the performance of the models. However, most existing methodologies for designing neural architectures, such as the heuristic manual design or neural architecture search, ignore the universal approximation property, thus losing a potential safeguard about the performance. In this paper, we propose a unified framework to design the architectures of deep neural networks with a universality guarantee based on first-order optimization algorithms, where the forward pass is interpreted as the updates of an optimization algorithm. The (explicit or implicit) network is designed by replacing each gradient term in the algorithm with a learnable module similar to a two-layer network or its derivatives Specifically, we explore the realm of width-bounded neural networks, a common practical scenario, showcasing their universality. Moreover, adding operations of normalization, downsampling, and upsampling does not hurt the universality. To the best of our knowledge, this is the first work that width-bounded networks with universal approximation guarantee can be designed in a principled way. Our framework can inspire a variety of neural architectures including some renowned structures such as ResNet and DenseNet, as well as novel innovations. The experimental results on image classification problems demonstrate that the newly inspired networks are competitive and surpass the baselines of ResNet, DenseNet, as well as the advanced ConvNeXt and ViT, testifying to the effectiveness of our framework.

2.

Towards Understanding Convergence and Generalization of AdamW.

Zhou, Pan; Xie, Xingyu; Lin, Zhouchen; Yan, Shuicheng.

IEEE Trans Pattern Anal Mach Intell ; PP2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38536692

ABSTRACT

AdamW modifies Adam by adding a decoupled weight decay to decay network weights per training iteration. For adaptive algorithms, this decoupled weight decay does not affect specific optimization steps, and differs from the widely used l2-regularizer which changes optimization steps via changing the first- and second-order gradient moments. Despite its great practical success, for AdamW, its convergence behavior and generalization improvement over Adam and l2-regularized Adam ( l2-Adam) remain absent yet. To solve this issue, we prove the convergence of AdamW and justify its generalization advantages over Adam and l2-Adam. Specifically, AdamW provably converges but minimizes a dynamically regularized loss that combines vanilla loss and a dynamical regularization induced by decoupled weight decay, thus yielding different behaviors with Adam and l2-Adam. Moreover, on both general nonconvex problems and PL-conditioned problems, we establish stochastic gradient complexity of AdamW to find a stationary point. Such complexity is also applicable to Adam and l2-Adam, and improves their previously known complexity, especially for over-parametrized networks. Besides, we prove that AdamW enjoys smaller generalization errors than Adam and l2-Adam from the Bayesian posterior aspect. This result, for the first time, explicitly reveals the benefits of decoupled weight decay in AdamW. Experimental results validate our theory.

3.

Efficient learning of Scale-Adaptive Nearly Affine Invariant Networks.

Shen, Zhengyang; Qiu, Yeqing; Liu, Jialun; He, Lingshen; Lin, Zhouchen.

Neural Netw ; 174: 106229, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38490114

ABSTRACT

Recent research has demonstrated the significance of incorporating invariance into neural networks. However, existing methods require direct sampling over the entire transformation set, notably computationally taxing for large groups like the affine group. In this study, we propose a more efficient approach by addressing the invariances of the subgroups within a larger group. For tackling affine invariance, we split it into the Euclidean group E(n) and uni-axial scaling group US(n), handling invariance individually. We employ an E(n)-invariant model for E(n)-invariance and average model outputs over data augmented from a US(n) distribution for US(n)-invariance. Our method maintains a favorable computational complexity of O(N2) in 2D and O(N4) in 3D scenarios, in contrast to the O(N6) (2D) and O(N12) (3D) complexities of averaged models. Crucially, the scale range for augmentation adapts during training to avoid excessive scale invariance. This is the first time nearly exact affine invariance is incorporated into neural networks without directly sampling the entire group. Extensive experiments unequivocally confirm its superiority, achieving new state-of-the-art results in affNIST and SIM2MNIST classifications while consuming less than 15% of inference time and fewer computational resources and model parameters compared to averaged models.

Subject(s)

Learning , Neural Networks, Computer

4.

Sampling complex topology structures for spiking neural networks.

Yan, Shen; Meng, Qingyan; Xiao, Mingqing; Wang, Yisen; Lin, Zhouchen.

Neural Netw ; 172: 106121, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38244355

ABSTRACT

Spiking Neural Networks (SNNs) have been considered a potential competitor to Artificial Neural Networks (ANNs) due to their high biological plausibility and energy efficiency. However, the architecture design of SNN has not been well studied. Previous studies either use ANN architectures or directly search for SNN architectures under a highly constrained search space. In this paper, we aim to introduce much more complex connection topologies to SNNs to further exploit the potential of SNN architectures. To this end, we propose the topology-aware search space, which is the first search space that enables a more diverse and flexible design for both the spatial and temporal topology of the SNN architecture. Then, to efficiently obtain architecture from our search space, we propose the spatio-temporal topology sampling (STTS) algorithm. By leveraging the benefits of random sampling, STTS can yield powerful architecture without the need for an exhaustive search process, making it significantly more efficient than alternative search strategies. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet demonstrate the effectiveness of our method. Notably, we obtain 70.79% top-1 accuracy on ImageNet with only 4 time steps, 1.79% higher than the second best model. Our code is available under https://github.com/stiger1000/Random-Sampling-SNN.

Subject(s)

Algorithms , Neural Networks, Computer

5.

Equilibrium Image Denoising with Implicit Differentiation.

Chen, Qi; Wang, Yifei; Geng, Zhengyang; Wang, Yisen; Yang, Jiansheng; Lin, Zhouchen.

IEEE Trans Image Process ; PP2023 Mar 14.

Article in English | MEDLINE | ID: mdl-37028348

ABSTRACT

Recent efforts on learning-based image denoising approaches use unrolled architectures with a fixed number of repeatedly stacked blocks. However, due to difficulties in training networks corresponding to deeper layers, simply stacking blocks may cause performance degradation, and the number of unrolled blocks needs to be manually tuned to find an appropriate value. To circumvent these problems, this paper describes an alternative approach with implicit models. To our best knowledge, our approach is the first attempt to model iterative image denoising through an implicit scheme. The model employs implicit differentiation to calculate gradients in the backward pass, thus avoiding the training difficulties of explicit models and elaborate selection of the iteration number. Our model is parameter-efficient and has only one implicit layer, which is a fixed-point equation that casts the desired noise feature as its solution. By simulating infinite iterations of the model, the final denoising result is given by the equilibrium that is achieved through accelerated black-box solvers. The implicit layer not only captures the non-local self-similarity prior for image denoising, but also facilitates training stability and thereby boosts the denoising performance. Extensive experiments show that our model leads to better performances than state-of-the-art explicit denoisers with enhanced qualitative and quantitative results.

6.

SPIDE: A purely spike-based method for training feedback spiking neural networks.

Xiao, Mingqing; Meng, Qingyan; Zhang, Zongpeng; Wang, Yisen; Lin, Zhouchen.

Neural Netw ; 161: 9-24, 2023 Apr.

Article in English | MEDLINE | ID: mdl-36736003

ABSTRACT

Spiking neural networks (SNNs) with event-based computation are promising brain-inspired models for energy-efficient applications on neuromorphic hardware. However, most supervised SNN training methods, such as conversion from artificial neural networks or direct training with surrogate gradients, require complex computation rather than spike-based operations of spiking neurons during training. In this paper, we study spike-based implicit differentiation on the equilibrium state (SPIDE) that extends the recently proposed training method, implicit differentiation on the equilibrium state (IDE), for supervised learning with purely spike-based computation, which demonstrates the potential for energy-efficient training of SNNs. Specifically, we introduce ternary spiking neuron couples and prove that implicit differentiation can be solved by spikes based on this design, so the whole training procedure, including both forward and backward passes, is made as event-driven spike computation, and weights are updated locally with two-stage average firing rates. Then we propose to modify the reset membrane potential to reduce the approximation error of spikes. With these key components, we can train SNNs with flexible structures in a small number of time steps and with firing sparsity during training, and the theoretical estimation of energy costs demonstrates the potential for high efficiency. Meanwhile, experiments show that even with these constraints, our trained models can still achieve competitive results on MNIST, CIFAR-10, CIFAR-100, and CIFAR10-DVS.

Subject(s)

Computers , Neural Networks, Computer , Feedback , Action Potentials/physiology , Membrane Potentials

7.

Structured Sparsity Optimization With Non-Convex Surrogates of l_2,0-Norm: A Unified Algorithmic Framework.

Zhang, Xiaoqin; Zheng, Jingjing; Wang, Di; Tang, Guiying; Zhou, Zhengyuan; Lin, Zhouchen.

IEEE Trans Pattern Anal Mach Intell ; 45(5): 6386-6402, 2023 May.

Article in English | MEDLINE | ID: mdl-36219668

ABSTRACT

In this article, we present a general optimization framework that leverages structured sparsity to achieve superior recovery results. The traditional method for solving the structured sparse objectives based on l2,0-norm is to use the l2,1-norm as a convex surrogate. However, such an approximation often yields a large performance gap. To tackle this issue, we first provide a framework that allows for a wide range of surrogate functions (including non-convex surrogates), which exhibits better performance in harnessing structured sparsity. Moreover, we develop a fixed point algorithm that solves a key underlying non-convex structured sparse recovery optimization problem to global optimality with a guaranteed super-linear convergence rate. Building on this, we consider three specific applications, i.e., outlier pursuit, supervised feature selection, and structured dictionary learning, which can benefit from the proposed structured sparsity optimization framework. In each application, how the optimization problem can be formulated and thus be relaxed under a generic surrogate function is explained in detail. We conduct extensive experiments on both synthetic and real-world data and demonstrate the effectiveness and efficiency of the proposed framework.

8.

Optimization Induced Equilibrium Networks: An Explicit Optimization Perspective for Understanding Equilibrium Models.

Xie, Xingyu; Wang, Qiuhao; Ling, Zenan; Li, Xia; Liu, Guangcan; Lin, Zhouchen.

IEEE Trans Pattern Anal Mach Intell ; 45(3): 3604-3616, 2023 Mar.

Article in English | MEDLINE | ID: mdl-35687620

ABSTRACT

To reveal the mystery behind deep neural networks (DNNs), optimization may offer a good perspective. There are already some clues showing the strong connection between DNNs and optimization problems, e.g., under a mild condition, DNN's activation function is indeed a proximal operator. In this paper, we are committed to providing a unified optimization induced interpretability for a special class of networks-equilibrium models, i.e., neural networks defined by fixed point equations, which have become increasingly attractive recently. To this end, we first decompose DNNs into a new class of unit layer that is the proximal operator of an implicit convex function while keeping its output unchanged. Then, the equilibrium model of the unit layer can be derived, we name it Optimization Induced Equilibrium Networks (OptEq). The equilibrium point of OptEq can be theoretically connected to the solution of a convex optimization problem with explicit objectives. Based on this, we can flexibly introduce prior properties to the equilibrium points: 1) modifying the underlying convex problems explicitly so as to change the architectures of OptEq; and 2) merging the information into the fixed point iteration, which guarantees to choose the desired equilibrium point when the fixed point set is non-singleton. We show that OptEq outperforms previous implicit models even with fewer parameters.

9.

Efficient and generalizable cross-patient epileptic seizure detection through a spiking neural network.

Zhang, Zongpeng; Xiao, Mingqing; Ji, Taoyun; Jiang, Yuwu; Lin, Tong; Zhou, Xiaohua; Lin, Zhouchen.

Front Neurosci ; 17: 1303564, 2023.

Article in English | MEDLINE | ID: mdl-38268711

ABSTRACT

Introduction: Epilepsy is a global chronic disease that brings pain and inconvenience to patients, and an electroencephalogram (EEG) is the main analytical tool. For clinical aid that can be applied to any patient, an automatic cross-patient epilepsy seizure detection algorithm is of great significance. Spiking neural networks (SNNs) are modeled on biological neurons and are energy-efficient on neuromorphic hardware, which can be expected to better handle brain signals and benefit real-world, low-power applications. However, automatic epilepsy seizure detection rarely considers SNNs. Methods: In this article, we have explored SNNs for cross-patient seizure detection and discovered that SNNs can achieve comparable state-of-the-art performance or a performance that is even better than artificial neural networks (ANNs). We propose an EEG-based spiking neural network (EESNN) with a recurrent spiking convolution structure, which may better take advantage of temporal and biological characteristics in EEG signals. Results: We extensively evaluate the performance of different SNN structures, training methods, and time settings, which builds a solid basis for understanding and evaluation of SNNs in seizure detection. Moreover, we show that our EESNN model can achieve energy reduction by several orders of magnitude compared with ANNs according to the theoretical estimation. Discussion: These results show the potential for building high-performance, low-power neuromorphic systems for seizure detection and also broaden real-world application scenarios of SNNs.

10.

Training much deeper spiking neural networks with a small number of time-steps.

Meng, Qingyan; Yan, Shen; Xiao, Mingqing; Wang, Yisen; Lin, Zhouchen; Luo, Zhi-Quan.

Neural Netw ; 153: 254-268, 2022 Sep.

Article in English | MEDLINE | ID: mdl-35759953

ABSTRACT

Spiking Neural Network (SNN) is a promising energy-efficient neural architecture when implemented on neuromorphic hardware. The Artificial Neural Network (ANN) to SNN conversion method, which is the most effective SNN training method, has successfully converted moderately deep ANNs to SNNs with satisfactory performance. However, this method requires a large number of time-steps, which hurts the energy efficiency of SNNs. How to effectively covert a very deep ANN (e.g., more than 100 layers) to an SNN with a small number of time-steps remains a difficult task. To tackle this challenge, this paper makes the first attempt to propose a novel error analysis framework that takes both the "quantization error" and the "deviation error" into account, which comes from the discretization of SNN dynamicsthe neuron's coding scheme and the inconstant input currents at intermediate layers, respectively. Particularly, our theories reveal that the "deviation error" depends on both the spike threshold and the input variance. Based on our theoretical analysis, we further propose the Threshold Tuning and Residual Block Restructuring (TTRBR) method that can convert very deep ANNs (>100 layers) to SNNs with negligible accuracy degradation while requiring only a small number of time-steps. With very deep networks, our TTRBR method achieves state-of-the-art (SOTA) performance on the CIFAR-10, CIFAR-100, and ImageNet classification tasks.

Subject(s)

Computers , Neural Networks, Computer

11.

Tensor Recovery With Weighted Tensor Average Rank.

Zhang, Xiaoqin; Zheng, Jingjing; Zhao, Li; Zhou, Zhengyuan; Lin, Zhouchen.

IEEE Trans Neural Netw Learn Syst ; PP2022 Jun 22.

Article in English | MEDLINE | ID: mdl-35731769

ABSTRACT

In this article, a curious phenomenon in the tensor recovery algorithm is considered: can the same recovered results be obtained when the observation tensors in the algorithm are transposed in different ways? If not, it is reasonable to imagine that some information within the data will be lost for the case of observation tensors under certain transpose operators. To solve this problem, a new tensor rank called weighted tensor average rank (WTAR) is proposed to learn the relationship between different resulting tensors by performing a series of transpose operators on an observation tensor. WTAR is applied to three-order tensor robust principal component analysis (TRPCA) to investigate its effectiveness. Meanwhile, to balance the effectiveness and solvability of the resulting model, a generalized model that involves the convex surrogate and a series of nonconvex surrogates are studied, and the corresponding worst case error bounds of the recovered tensor is given. Besides, a generalized tensor singular value thresholding (GTSVT) method and a generalized optimization algorithm based on GTSVT are proposed to solve the generalized model effectively. The experimental results indicate that the proposed method is effective.

12.

Learning Deep Sparse Regularizers With Applications to Multi-View Clustering and Semi-Supervised Classification.

Wang, Shiping; Chen, Zhaoliang; Du, Shide; Lin, Zhouchen.

IEEE Trans Pattern Anal Mach Intell ; 44(9): 5042-5055, 2022 Sep.

Article in English | MEDLINE | ID: mdl-34018930

ABSTRACT

Sparsity-constrained optimization problems are common in machine learning, such as sparse coding, low-rank minimization and compressive sensing. However, most of previous studies focused on constructing various hand-crafted sparse regularizers, while little work was devoted to learning adaptive sparse regularizers from given input data for specific tasks. In this paper, we propose a deep sparse regularizer learning model that learns data-driven sparse regularizers adaptively. Via the proximal gradient algorithm, we find that the sparse regularizer learning is equivalent to learning a parameterized activation function. This encourages us to learn sparse regularizers in the deep learning framework. Therefore, we build a neural network composed of multiple blocks, each being differentiable and reusable. All blocks contain learnable piecewise linear activation functions which correspond to the sparse regularizer to be learned. Furthermore, the proposed model is trained with back propagation, and all parameters in this model are learned end-to-end. We apply our framework to multi-view clustering and semi-supervised classification tasks to learn a latent compact representation. Experimental results demonstrate the superiority of the proposed framework over state-of-the-art multi-view learning models.

13.

Training Neural Networks by Lifted Proximal Operator Machines.

Li, Jia; Xiao, Mingqing; Fang, Cong; Dai, Yue; Xu, Chao; Lin, Zhouchen.

IEEE Trans Pattern Anal Mach Intell ; 44(6): 3334-3348, 2022 06.

Article in English | MEDLINE | ID: mdl-33382647

ABSTRACT

We present the lifted proximal operator machine (LPOM) to train fully-connected feed-forward neural networks. LPOM represents the activation function as an equivalent proximal operator and adds the proximal operators to the objective function of a network as penalties. LPOM is block multi-convex in all layer-wise weights and activations. This allows us to develop a new block coordinate descent (BCD) method with convergence guarantee to solve it. Due to the novel formulation and solving method, LPOM only uses the activation function itself and does not require any gradient steps. Thus it avoids the gradient vanishing or exploding issues, which are often blamed in gradient-based methods. Also, it can handle various non-decreasing Lipschitz continuous activation functions. Additionally, LPOM is almost as memory-efficient as stochastic gradient descent and its parameter tuning is relatively easy. We further implement and analyze the parallel solution of LPOM. We first propose a general asynchronous-parallel BCD method with convergence guarantee. Then we use it to solve LPOM, resulting in asynchronous-parallel LPOM. For faster speed, we develop the synchronous-parallel LPOM. We validate the advantages of LPOM on various network architectures and datasets. We also apply synchronous-parallel LPOM to autoencoder training and demonstrate its fast convergence and superior performance.

Subject(s)

Algorithms , Neural Networks, Computer

14.

Investigating Bi-Level Optimization for Learning and Vision From a Unified Perspective: A Survey and Beyond.

Liu, Risheng; Gao, Jiaxin; Zhang, Jin; Meng, Deyu; Lin, Zhouchen.

IEEE Trans Pattern Anal Mach Intell ; 44(12): 10045-10067, 2022 Dec.

Article in English | MEDLINE | ID: mdl-34871167

ABSTRACT

Bi-Level Optimization (BLO) is originated from the area of economic game theory and then introduced into the optimization community. BLO is able to handle problems with a hierarchical structure, involving two levels of optimization tasks, where one task is nested inside the other. In machine learning and computer vision fields, despite the different motivations and mechanisms, a lot of complex problems, such as hyper-parameter optimization, multi-task and meta learning, neural architecture search, adversarial learning and deep reinforcement learning, actually all contain a series of closely related subproblms. In this paper, we first uniformly express these complex learning and vision problems from the perspective of BLO. Then we construct a best-response-based single-level reformulation and establish a unified algorithmic framework to understand and formulate mainstream gradient-based BLO methodologies, covering aspects ranging from fundamental automatic differentiation schemes to various accelerations, simplifications, extensions and their convergence and complexity properties. Last but not least, we discuss the potentials of our unified BLO framework for designing new algorithms and point out some promising directions for future research. A list of important papers discussed in this survey, corresponding codes, and additional resources on BLOs are publicly available at: https://github.com/vis-opt-group/BLO.

15.

Towards Efficient Scene Understanding via Squeeze Reasoning.

Li, Xiangtai; Li, Xia; You, Ansheng; Zhang, Li; Cheng, Guangliang; Yang, Kuiyuan; Tong, Yunhai; Lin, Zhouchen.

IEEE Trans Image Process ; 30: 7050-7063, 2021.

Article in English | MEDLINE | ID: mdl-34329163

ABSTRACT

Graph-based convolutional model such as non-local block has shown to be effective for strengthening the context modeling ability in convolutional neural networks (CNNs). However, its pixel-wise computational overhead is prohibitive which renders it unsuitable for high resolution imagery. In this paper, we explore the efficiency of context graph reasoning and propose a novel framework called Squeeze Reasoning. Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector and perform reasoning within the single vector where the computation cost can be significantly reduced. Specifically, we build the node graph in the vector where each node represents an abstract semantic concept. The refined feature within the same semantic category results to be consistent, which is thus beneficial for downstream tasks. We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks. Despite its simplicity and being lightweight, the proposed strategy allows us to establish the considerable results on different semantic segmentation datasets and shows significant improvements with respect to strong baselines on various other scene understanding tasks including object detection, instance segmentation and panoptic segmentation. Code is available at https://github.com/lxtGH/SFSegNets.

16.

A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization.

Zhou, Pan; Yuan, Xiaotong; Lin, Zhouchen; Hoi, Steven.

IEEE Trans Pattern Anal Mach Intell ; PP2021 Jun 08.

Article in English | MEDLINE | ID: mdl-34101583

ABSTRACT

Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient~(HSDMPG) algorithm for strongly convex problems with linear prediction structure, e.g.~least squares and logistic/softmax regression. HSDMPG~enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving~minibatch of individual losses to estimate the original problem, and efficiently minimizes the sampled smaller-sized subproblems. For strongly convex loss of n components, HSDMPG~attains an Ïµ-optimization-error within [Formula: see text] stochastic gradient evaluations, where κ is condition number, Î¶ = 1 for quadratic loss and Î¶ = 2 for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when Ïµ = O(1/ân) which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively O (n0.5log2(n)) and O (n0.5log3(n)), which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG~to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of~HSDMPG.

17.

Tensor Low-Rank Representation for Data Recovery and Clustering.

Zhou, Pan; Lu, Canyi; Feng, Jiashi; Lin, Zhouchen; Yan, Shuicheng.

IEEE Trans Pattern Anal Mach Intell ; 43(5): 1718-1732, 2021 May.

Article in English | MEDLINE | ID: mdl-31751228

ABSTRACT

Multi-way or tensor data analysis has attracted increasing attention recently, with many important applications in practice. This article develops a tensor low-rank representation (TLRR) method, which is the first approach that can exactly recover the clean data of intrinsic low-rank structure and accurately cluster them as well, with provable performance guarantees. In particular, for tensor data with arbitrary sparse corruptions, TLRR can exactly recover the clean data under mild conditions; meanwhile TLRR can exactly verify their true origin tensor subspaces and hence cluster them accurately. TLRR objective function can be optimized via efficient convex programing with convergence guarantees. Besides, we provide two simple yet effective dictionary construction methods, the simple TLRR (S-TLRR) and robust TLRR (R-TLRR), to handle slightly and severely corrupted data respectively. Experimental results on two computer vision data analysis tasks, image/video recovery and face clustering, clearly demonstrate the superior performance, efficiency and robustness of our developed method over state-of-the-arts including the popular LRR and SSC methods.

18.

Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning.

Liu, Yuanyuan; Shang, Fanhua; Liu, Hongying; Kong, Lin; Jiao, Licheng; Lin, Zhouchen.

IEEE Trans Pattern Anal Mach Intell ; 43(12): 4242-4255, 2021 Dec.

Article in English | MEDLINE | ID: mdl-32750780

ABSTRACT

Recently, many stochastic variance reduced alternating direction methods of multipliers (ADMMs) (e.g., SAG-ADMM and SVRG-ADMM) have made exciting progress such as linear convergence rate for strongly convex (SC) problems. However, their best-known convergence rate for non-strongly convex (non-SC) problems is O(1/T) as opposed to O(1/T2) of accelerated deterministic algorithms, where T is the number of iterations. Thus, there remains a gap in the convergence rates of existing stochastic ADMM and deterministic algorithms. To bridge this gap, we introduce a new momentum acceleration trick into stochastic variance reduced ADMM, and propose a novel accelerated SVRG-ADMM method (called ASVRG-ADMM) for the machine learning problems with the constraint Ax + By = c. Then we design a linearized proximal update rule and a simple proximal one for the two classes of ADMM-style problems with B = τI and B ≠ τI, respectively, where I is an identity matrix and τ is an arbitrary bounded constant. Note that our linearized proximal update rule can avoid solving sub-problems iteratively. Moreover, we prove that ASVRG-ADMM converges linearly for SC problems. In particular, ASVRG-ADMM improves the convergence rate from O(1/T) to O(1/T2) for non-SC problems. Finally, we apply ASVRG-ADMM to various machine learning problems, e.g., graph-guided fused Lasso, graph-guided logistic regression, graph-guided SVM, generalized graph-guided fused Lasso and multi-task learning, and show that ASVRG-ADMM consistently converges faster than the state-of-the-art methods.

19.

DATA: Differentiable ArchiTecture Approximation With Distribution Guided Sampling.

Zhang, Xinbang; Chang, Jianlong; Guo, Yiwen; Meng, Gaofeng; Xiang, Shiming; Lin, Zhouchen; Pan, Chunhong.

IEEE Trans Pattern Anal Mach Intell ; 43(9): 2905-2920, 2021 09.

Article in English | MEDLINE | ID: mdl-32866094

ABSTRACT

Neural architecture search (NAS) is inherently subject to the gap of architectures during searching and validating. To bridge this gap effectively, we develop Differentiable ArchiTecture Approximation (DATA) with Ensemble Gumbel-Softmax (EGS) estimator and Architecture Distribution Constraint (ADC) to automatically approximate architectures during searching and validating in a differentiable manner. Technically, the EGS estimator consists of a group of Gumbel-Softmax estimators, which is capable of converting probability vectors to binary codes and passing gradients reversely, reducing the estimation bias in a differentiable way. To narrow the distribution gap between sampled architectures and supernet, further, the ADC is introduced to reduce the variance of sampling during searching. Benefiting from such modeling, architecture probabilities and network weights in the NAS model can be jointly optimized with the standard back-propagation, yielding an end-to-end learning mechanism for searching deep neural architectures in an extended search space. Conclusively, in the validating process, a high-performance architecture that approaches to the learned one during searching is readily built. Extensive experiments on various tasks including image classification, few-shot learning, unsupervised clustering, semantic segmentation and language modeling strongly demonstrate that DATA is capable of discovering high-performance architectures while guaranteeing the required efficiency. Code is available at https://github.com/XinbangZhang/DATA-NAS.

20.

On the Convergence of Learning-Based Iterative Methods for Nonconvex Inverse Problems.

Liu, Risheng; Cheng, Shichao; He, Yi; Fan, Xin; Lin, Zhouchen; Luo, Zhongxuan.

IEEE Trans Pattern Anal Mach Intell ; 42(12): 3027-3039, 2020 Dec.

Article in English | MEDLINE | ID: mdl-31170064

ABSTRACT

Numerous tasks at the core of statistics, learning and vision areas are specific cases of ill-posed inverse problems. Recently, learning-based (e.g., deep) iterative methods have been empirically shown to be useful for these problems. Nevertheless, integrating learnable structures into iterations is still a laborious process, which can only be guided by intuitions or empirical insights. Moreover, there is a lack of rigorous analysis about the convergence behaviors of these reimplemented iterations, and thus the significance of such methods is a little bit vague. This paper moves beyond these limits and proposes Flexible Iterative Modularization Algorithm (FIMA), a generic and provable paradigm for nonconvex inverse problems. Our theoretical analysis reveals that FIMA allows us to generate globally convergent trajectories for learning-based iterative methods. Meanwhile, the devised scheduling policies on flexible modules should also be beneficial for classical numerical methods in the nonconvex scenario. Extensive experiments on real applications verify the superiority of FIMA.

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL