Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 78
Filter
Add more filters

Publication year range
1.
Bioinformatics ; 35(10): 1729-1736, 2019 05 15.
Article in English | MEDLINE | ID: mdl-30307540

ABSTRACT

MOTIVATION: A large number of recent genome-wide association studies (GWASs) for complex phenotypes confirm the early conjecture for polygenicity, suggesting the presence of large number of variants with only tiny or moderate effects. However, due to the limited sample size of a single GWAS, many associated genetic variants are too weak to achieve the genome-wide significance. These undiscovered variants further limit the prediction capability of GWAS. Restricted access to the individual-level data and the increasing availability of the published GWAS results motivate the development of methods integrating both the individual-level and summary-level data. How to build the connection between the individual-level and summary-level data determines the efficiency of using the existing abundant summary-level resources with limited individual-level data, and this issue inspires more efforts in the existing area. RESULTS: In this study, we propose a novel statistical approach, LEP, which provides a novel way of modeling the connection between the individual-level data and summary-level data. LEP integrates both types of data by LEveraging Pleiotropy to increase the statistical power of risk variants identification and the accuracy of risk prediction. The algorithm for parameter estimation is developed to handle genome-wide-scale data. Through comprehensive simulation studies, we demonstrated the advantages of LEP over the existing methods. We further applied LEP to perform integrative analysis of Crohn's disease from WTCCC and summary statistics from GWAS of some other diseases, such as Type 1 diabetes, Ulcerative colitis and Primary biliary cirrhosis. LEP was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.39% (±0.58%) to 68.33% (±0.32%) using about 195 000 variants. AVAILABILITY AND IMPLEMENTATION: The LEP software is available at https://github.com/daviddaigithub/LEP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Algorithms , Phenotype , Polymorphism, Single Nucleotide , Software
2.
Bioinformatics ; 33(18): 2882-2889, 2017 Sep 15.
Article in English | MEDLINE | ID: mdl-28498950

ABSTRACT

MOTIVATION: Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. RESULTS: In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. AVAILABILITY AND IMPLEMENTATION: The IGESS software is available at https://github.com/daviddaigithub/IGESS . CONTACT: zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study/methods , Models, Statistical , Software , Algorithms , Humans , Sample Size
3.
Bioinformatics ; 32(22): 3480-3488, 2016 11 15.
Article in English | MEDLINE | ID: mdl-27466625

ABSTRACT

MOTIVATION: Imaging genetics combines brain imaging and genetic information to identify the relationships between genetic variants and brain activities. When the data samples belong to different classes (e.g. disease status), the relationships may exhibit class-specific patterns that can be used to facilitate the understanding of a disease. Conventional approaches often perform separate analysis on each class and report the differences, but ignore important shared patterns. RESULTS: In this paper, we develop a multivariate method to analyze the differential dependency across multiple classes. We propose a joint sparse canonical correlation analysis method, which uses a generalized fused lasso penalty to jointly estimate multiple pairs of canonical vectors with both shared and class-specific patterns. Using a data fusion approach, the method is able to detect differentially correlated modules effectively and efficiently. The results from simulation studies demonstrate its higher accuracy in discovering both common and differential canonical correlations compared to conventional sparse CCA. Using a schizophrenia dataset with 92 cases and 116 controls including a single nucleotide polymorphism (SNP) array and functional magnetic resonance imaging data, the proposed method reveals a set of distinct SNP-voxel interaction modules for the schizophrenia patients, which are verified to be both statistically and biologically significant. AVAILABILITY AND IMPLEMENTATION: The Matlab code is available at https://sites.google.com/site/jianfang86/JSCCA CONTACT: wyp@tulane.eduSupplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Brain Mapping , Brain , Magnetic Resonance Imaging , Genetic Variation , Humans , Polymorphism, Single Nucleotide , Schizophrenia
4.
Neural Comput ; 29(7): 1879-1901, 2017 07.
Article in English | MEDLINE | ID: mdl-28410056

ABSTRACT

Recently, a new framework, Fredholm learning, was proposed for semisupervised learning problems based on solving a regularized Fredholm integral equation. It allows a natural way to incorporate unlabeled data into learning algorithms to improve their prediction performance. Despite rapid progress on implementable algorithms with theoretical guarantees, the generalization ability of Fredholm kernel learning has not been studied. In this letter, we focus on investigating the generalization performance of a family of classification algorithms, referred to as Fredholm kernel regularized classifiers. We prove that the corresponding learning rate can achieve [Formula: see text] ([Formula: see text] is the number of labeled samples) in a limiting case. In addition, a representer theorem is provided for the proposed regularized scheme, which underlies its applications.

5.
Neural Comput ; 26(10): 2350-78, 2014 Oct.
Article in English | MEDLINE | ID: mdl-25058698

ABSTRACT

Regularization is a well-recognized powerful strategy to improve the performance of a learning machine and l(q) regularization schemes with 0 < q < ∞ are central in use. It is known that different q leads to different properties of the deduced estimators, say, l(2) regularization leads to a smooth estimator, while l(1) regularization leads to a sparse estimator. Then how the generalization capability of l(q) regularization learning varies with q is worthy of investigation. In this letter, we study this problem in the framework of statistical learning theory. Our main results show that implementing l(q) coefficient regularization schemes in the sample-dependent hypothesis space associated with a gaussian kernel can attain the same almost optimal learning rates for all 0 < q < ∞. That is, the upper and lower bounds of learning rates for l(q) regularization learning are asymptotically identical for all 0 < q < ∞. Our finding tentatively reveals that in some modeling contexts, the choice of q might not have a strong impact on the generalization capability. From this perspective, q can be arbitrarily specified, or specified merely by other nongeneralization criteria like smoothness, computational complexity or sparsity.


Subject(s)
Artificial Intelligence , Learning/physiology , Normal Distribution , Humans , Signal Processing, Computer-Assisted
6.
Natl Sci Rev ; 11(8): nwae277, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39229289

ABSTRACT

This paper introduces a 'simulating learning methodology' (SLeM) approach for the learning methodology determination in general and for Auto6 ML in particular, and reports the SLeM framework, approaches, algorithms and applications.

7.
IEEE Trans Pattern Anal Mach Intell ; 46(10): 6577-6593, 2024 Oct.
Article in English | MEDLINE | ID: mdl-38557620

ABSTRACT

The deep unfolding approach has attracted significant attention in computer vision tasks, which well connects conventional image processing modeling manners with more recent deep learning techniques. Specifically, by establishing a direct correspondence between algorithm operators at each implementation step and network modules within each layer, one can rationally construct an almost "white box" network architecture with high interpretability. In this architecture, only the predefined component of the proximal operator, known as a proximal network, needs manual configuration, enabling the network to automatically extract intrinsic image priors in a data-driven manner. In current deep unfolding methods, such a proximal network is generally designed as a CNN architecture, whose necessity has been proven by a recent theory. That is, CNN structure substantially delivers the translational symmetry image prior, which is the most universally possessed structural prior across various types of images. However, standard CNN-based proximal networks have essential limitations in capturing the rotation symmetry prior, another universal structural prior underlying general images. This leaves a large room for further performance improvement in deep unfolding approaches. To address this issue, this study makes efforts to suggest a high-accuracy rotation equivariant proximal network that effectively embeds rotation symmetry priors into the deep unfolding framework. Especially, we deduce, for the first time, the theoretical equivariant error for such a designed proximal network with arbitrary layers under arbitrary rotation degrees. This analysis should be the most refined theoretical conclusion for such error evaluation to date and is also indispensable for supporting the rationale behind such networks with intrinsic interpretability requirements. Through experimental validation on different vision tasks, including blind image super-resolution, medical image reconstruction, and image de-raining, the proposed method is validated to be capable of directly replacing the proximal network in current deep unfolding architecture and readily enhancing their state-of-the-art performance. This indicates its potential usability in general vision tasks.

8.
Article in English | MEDLINE | ID: mdl-39331550

ABSTRACT

Robust loss minimization is an important strategy for handling robust learning issue on noisy labels. Current approaches for designing robust losses involve the introduction of noise-robust factors, i.e., hyperparameters, to control the trade-off between noise robustness and learnability. However, finding suitable hyperparameters for different datasets with noisy labels is a challenging and time-consuming task. Moreover, existing robust loss methods usually assume that all training samples share common hyperparameters, which are independent of instances. This limits the ability of these methods to distinguish the individual noise properties of different samples and overlooks the varying contributions of diverse training samples in helping models understand underlying patterns. To address above issues, we propose to assemble robust loss with instance-dependent hyperparameters to improve their noise tolerance with theoretical guarantee. To achieve setting such instance-dependent hyperparameters for robust loss, we propose a meta-learning method which is capable of adaptively learning a hyperparameter prediction function, called noise-aware-robust-loss-adjuster (NARL-Adjuster). Through mutual amelioration between hyperparameter prediction function and classifier parameters in our method, both of them can be simultaneously finely ameliorated and coordinated to attain solutions with good generalization capability. Four SOTA robust loss functions are attempted to be integrated with our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its noise tolerance and performance. Meanwhile, the explicit parameterized structure makes the meta-learned prediction function ready to be transferrable and plug-and-play to unseen datasets with noisy labels. Specifically, we transfer our meta-learned NARL-Adjuster to unseen tasks, including several real noisy datasets, and achieve better performance compared with conventional hyperparameter tuning strategy, even with carefully tuned hyperparameters.

9.
IEEE Trans Med Imaging ; 43(5): 1677-1689, 2024 May.
Article in English | MEDLINE | ID: mdl-38145543

ABSTRACT

Low-dose computed tomography (LDCT) helps to reduce radiation risks in CT scanning while maintaining image quality, which involves a consistent pursuit of lower incident rays and higher reconstruction performance. Although deep learning approaches have achieved encouraging success in LDCT reconstruction, most of them treat the task as a general inverse problem in either the image domain or the dual (sinogram and image) domains. Such frameworks have not considered the original noise generation of the projection data and suffer from limited performance improvement for the LDCT task. In this paper, we propose a novel reconstruction model based on noise-generating and imaging mechanism in full-domain, which fully considers the statistical properties of intrinsic noises in LDCT and prior information in sinogram and image domains. To solve the model, we propose an optimization algorithm based on the proximal gradient technique. Specifically, we derive the approximate solutions of the integer programming problem on the projection data theoretically. Instead of hand-crafting the sinogram and image regularizers, we propose to unroll the optimization algorithm to be a deep network. The network implicitly learns the proximal operators of sinogram and image regularizers with two deep neural networks, providing a more interpretable and effective reconstruction procedure. Numerical results demonstrate our proposed method improvements of > 2.9 dB in peak signal to noise ratio, > 1.4% promotion in structural similarity metric, and > 9 HU decrements in root mean square error over current state-of-the-art LDCT methods.


Subject(s)
Algorithms , Image Processing, Computer-Assisted , Phantoms, Imaging , Tomography, X-Ray Computed , Tomography, X-Ray Computed/methods , Humans , Image Processing, Computer-Assisted/methods , Deep Learning , Radiation Dosage
10.
BMC Bioinformatics ; 14: 198, 2013 Jun 19.
Article in English | MEDLINE | ID: mdl-23777239

ABSTRACT

BACKGROUND: Microarray technology is widely used in cancer diagnosis. Successfully identifying gene biomarkers will significantly help to classify different cancer types and improve the prediction accuracy. The regularization approach is one of the effective methods for gene selection in microarray data, which generally contain a large number of genes and have a small number of samples. In recent years, various approaches have been developed for gene selection of microarray data. Generally, they are divided into three categories: filter, wrapper and embedded methods. Regularization methods are an important embedded technique and perform both continuous shrinkage and automatic gene selection simultaneously. Recently, there is growing interest in applying the regularization techniques in gene selection. The popular regularization technique is Lasso (L1), and many L1 type regularization terms have been proposed in the recent years. Theoretically, the Lq type regularization with the lower value of q would lead to better solutions with more sparsity. Moreover, the L1/2 regularization can be taken as a representative of Lq (0

Subject(s)
Gene Expression Regulation , Logistic Models , Neoplasms/classification , Neoplasms/genetics , Algorithms , Genetic Markers , Humans , Neoplasms/metabolism , Oligonucleotide Array Sequence Analysis/methods
11.
J Opt Soc Am A Opt Image Sci Vis ; 30(10): 1956-66, 2013 Oct 01.
Article in English | MEDLINE | ID: mdl-24322850

ABSTRACT

Multiplicative noise is one common type of noise in imaging science. For coherent image-acquisition systems, such as synthetic aperture radar, the observed images are often contaminated by multiplicative noise. Total variation (TV) regularization has been widely researched for multiplicative noise removal in the literature due to its edge-preserving feature. However, the TV-based solutions sometimes have an undesirable staircase artifact. In this paper, we propose a model to take advantage of the good nature of the TV norm and high-order TV norm to balance the edge and smoothness region. Besides, we adopt a spatially regularization parameter updating scheme. Numerical results illustrate the efficiency of our method in terms of the signal-to-noise ratio and structure similarity index.

12.
ScientificWorldJournal ; 2013: 475702, 2013.
Article in English | MEDLINE | ID: mdl-24453861

ABSTRACT

A new adaptive L1/2 shooting regularization method for variable selection based on the Cox's proportional hazards mode being proposed. This adaptive L1/2 shooting algorithm can be easily obtained by the optimization of a reweighed iterative series of L1 penalties and a shooting strategy of L1/2 penalty. Simulation results based on high dimensional artificial data show that the adaptive L1/2 shooting regularization method can be more accurate for variable selection than Lasso and adaptive Lasso methods. The results from real gene expression dataset (DLBCL) also indicate that the L1/2 regularization method performs competitively.


Subject(s)
Gene Expression Regulation , Kaplan-Meier Estimate , Models, Biological , Proportional Hazards Models , Animals
13.
IEEE Trans Med Imaging ; 42(12): 3678-3689, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37540616

ABSTRACT

Accurate segmentation of brain tumors is of critical importance in clinical assessment and treatment planning, which requires multiple MR modalities providing complementary information. However, due to practical limits, one or more modalities may be missing in real scenarios. To tackle this problem, existing methods need to train multiple networks or a unified but fixed network for various possible missing modality cases, which leads to high computational burdens or sub-optimal performance. In this paper, we propose a unified and adaptive multi-modal MR image synthesis method, and further apply it to tumor segmentation with missing modalities. Based on the decomposition of multi-modal MR images into common and modality-specific features, we design a shared hyper-encoder for embedding each available modality into the feature space, a graph-attention-based fusion block to aggregate the features of available modalities to the fused features, and a shared hyper-decoder for image reconstruction. We also propose an adversarial common feature constraint to enforce the fused features to be in a common space. As for missing modality segmentation, we first conduct the feature-level and image-level completion using our synthesis method and then segment the tumors based on the completed MR images together with the extracted common features. Moreover, we design a hypernet-based modulation module to adaptively utilize the real and synthetic modalities. Experimental results suggest that our method can not only synthesize reasonable multi-modal MR images, but also achieve state-of-the-art performance on brain tumor segmentation with missing modalities.


Subject(s)
Brain Neoplasms , Humans , Brain Neoplasms/diagnostic imaging , Image Processing, Computer-Assisted
14.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 4537-4551, 2023 Apr.
Article in English | MEDLINE | ID: mdl-35930514

ABSTRACT

It has been shown that equivariant convolution is very helpful for many types of computer vision tasks. Recently, the 2D filter parametrization technique has played an important role for designing equivariant convolutions, and has achieved success in making use of rotation symmetry of images. However, the current filter parametrization strategy still has its evident drawbacks, where the most critical one lies in the accuracy problem of filter representation. To address this issue, in this paper we explore an ameliorated Fourier series expansion for 2D filters, and propose a new filter parametrization method based on it. The proposed filter parametrization method not only finely represents 2D filters with zero error when the filter is not rotated (similar as the classical Fourier series expansion), but also substantially alleviates the aliasing-effect-caused quality degradation when the filter is rotated (which usually arises in classical Fourier series expansion method). Accordingly, we construct a new equivariant convolution method based on the proposed filter parametrization method, named F-Conv. We prove that the equivariance of the proposed F-Conv is exact in the continuous domain, which becomes approximate only after discretization. Moreover, we provide theoretical error analysis for the case when the equivariance is approximate, showing that the approximation error is related to the mesh size and filter size. Extensive experiments show the superiority of the proposed method. Particularly, we adopt rotation equivariant convolution methods to a typical low-level image processing task, image super-resolution. It can be substantiated that the proposed F-Conv based method evidently outperforms classical convolution based methods. Compared with pervious filter parametrization based methods, the F-Conv performs more accurately on this low-level image processing task, reflecting its intrinsic capability of faithfully preserving rotation symmetries in local image features.

15.
IEEE Trans Cybern ; 53(9): 5469-5482, 2023 Sep.
Article in English | MEDLINE | ID: mdl-35286274

ABSTRACT

Detecting overlapping communities of an attribute network is a ubiquitous yet very difficult task, which can be modeled as a discrete optimization problem. Besides the topological structure of the network, node attributes and node overlapping aggravate the difficulty of community detection significantly. In this article, we propose a novel continuous encoding method to convert the discrete-natured detection problem to a continuous one by associating each edge and node attribute in the network with a continuous variable. Based on the encoding, we propose to solve the converted continuous problem by a multiobjective evolutionary algorithm (MOEA) based on decomposition. To find the overlapping nodes, a heuristic based on double-decoding is proposed, which is only with linear complexity. Furthermore, a postprocess community merging method in consideration of node attributes is developed to enhance the homogeneity of nodes in the detected communities. Various synthetic and real-world networks are used to verify the effectiveness of the proposed approach. The experimental results show that the proposed approach performs significantly better than a variety of evolutionary and nonevolutionary methods on most of the benchmark networks.

16.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12618-12634, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37126627

ABSTRACT

Deep neural networks suffer from catastrophic forgetting when trained on sequential tasks in continual learning. Various methods rely on storing data of previous tasks to mitigate catastrophic forgetting, which is prohibited in real-world applications considering privacy and security issues. In this paper, we consider a realistic setting of continual learning, where training data of previous tasks are unavailable and memory resources are limited. We contribute a novel knowledge distillation-based method in an information-theoretic framework by maximizing mutual information between outputs of previously learned and current networks. Due to the intractability of computation of mutual information, we instead maximize its variational lower bound, where the covariance of variational distribution is modeled by a graph convolutional network. The inaccessibility of data of previous tasks is tackled by Taylor expansion, yielding a novel regularizer in network training loss for continual learning. The regularizer relies on compressed gradients of network parameters. It avoids storing previous task data and previously learned networks. Additionally, we employ self-supervised learning technique for learning effective features, which improves the performance of continual learning. We conduct extensive experiments including image classification and semantic segmentation, and the results show that our method achieves state-of-the-art performance on continual learning benchmarks.

17.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 11521-11539, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37126626

ABSTRACT

Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance. Sample re-weighting methods are popularly used to alleviate this data bias issue. Most current methods, however, require to manually pre-specify the weighting schemes relying on the characteristics of the investigated problem and training data. This makes them fairly hard to be generally applied in practical scenarios, due to their significant complexities and inter-class variations of data bias. To address this issue, we propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data. Specifically, by seeing each training class as a separate learning task, our method aims to extract an explicit weighting function with sample loss and task/class feature as input, and sample weight as output, expecting to impose adaptively varying weighting schemes to different sample classes based on their own intrinsic bias characteristics. Extensive experiments substantiate the capability of our method on achieving proper weighting schemes in various data bias cases, like class imbalance, feature-independent and dependent label noises, and more complicated bias scenarios beyond conventional cases. Besides, the task-transferability of the learned weighting scheme is also substantiated, by readily deploying the weighting function learned on relatively smaller-scale CIFAR-10 dataset on much larger-scale full WebVision dataset. The general availability of our method for multiple robust deep learning issues, including partial-label learning, semi-supervised learning and selective classification, has also been validated. Code for reproducing our experiments is available at https://github.com/xjtushujun/CMW-Net.

18.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3505-3521, 2023 Mar.
Article in English | MEDLINE | ID: mdl-35724299

ABSTRACT

The learning rate (LR) is one of the most important hyperparameters in stochastic gradient descent (SGD) algorithm for training deep neural networks (DNN). However, current hand-designed LR schedules need to manually pre-specify a fixed form, which limits their ability to adapt to practical non-convex optimization problems due to the significant diversification of training dynamics. Meanwhile, it always needs to search proper LR schedules from scratch for new tasks, which, however, are often largely different with task variations, like data modalities, network architectures, or training data capacities. To address this learning-rate-schedule setting issue, we propose to parameterize LR schedules with an explicit mapping formulation, called MLR-SNet. The learnable parameterized structure brings more flexibility for MLR-SNet to learn a proper LR schedule to comply with the training dynamics of DNN. Image and text classification benchmark experiments substantiate the capability of our method for achieving proper LR schedules. Moreover, the explicit parameterized structure makes the meta-learned LR schedules capable of being transferable and plug-and-play, which can be easily generalized to new heterogeneous tasks. We transfer our meta-learned MLR-SNet to query tasks like different training epochs, network architectures, data modalities, dataset sizes from the training ones, and achieve comparable or even better performance compared with hand-designed LR schedules specifically designed for the query tasks. The robustness of MLR-SNet is also substantiated when the training data are biased with corrupted noise. We further prove the convergence of the SGD algorithm equipped with LR schedule produced by our MLR-SNet, with the convergence rate comparable to the best-known ones of the algorithm for solving the problem. The source code of our method is released at https://github.com/xjtushujun/MLR-SNet.

19.
IEEE Trans Cybern ; PP2022 Aug 22.
Article in English | MEDLINE | ID: mdl-35994533

ABSTRACT

Matrix factorization (MF) methods decompose a data matrix into a product of two-factor matrices (denoted as U and V ) which are with low ranks. In this article, we propose a generative latent variable model for the data matrix, in which each entry is assumed to be a Gaussian with mean to be the inner product of the corresponding columns of U and V . The prior of each column of U and V is assumed to be as a finite mixture of Gaussians. Further, we propose to model the attribute matrix with the data matrix jointly by considering them as conditional independence with respect to the factor matrix U , building upon previously defined model for the data matrix. Due to the intractability of the proposed models, we employ variational Bayes to infer the posteriors of the factor matrices and the clustering relationships, and to optimize for the model parameters. In our development, the posteriors and model parameters can be readily computed in closed forms, which is much more computationally efficient than existing sampling-based probabilistic MF models. Comprehensive experimental studies of the proposed methods on collaborative filtering and community detection tasks demonstrate that the proposed methods achieve the state-of-the-art performance against a great number of MF-based and non-MF-based algorithms.

20.
IEEE Trans Pattern Anal Mach Intell ; 44(8): 4469-4484, 2022 Aug.
Article in English | MEDLINE | ID: mdl-33621172

ABSTRACT

Stochastic optimization algorithms have been popular for training deep neural networks. Recently, there emerges a new approach of learning-based optimizer, which has achieved promising performance for training neural networks. However, these black-box learning-based optimizers do not fully take advantage of the experience in human-designed optimizers and heavily rely on learning from meta-training tasks, therefore have limited generalization ability. In this paper, we propose a novel optimizer, dubbed as Variational HyperAdam, which is based on a parametric generalized Adam algorithm, i.e., HyperAdam, in a variational framework. With Variational HyperAdam as optimizer for training neural network, the parameter update vector of the neural network at each training step is considered as random variable, whose approximate posterior distribution given the training data and current network parameter vector is predicted by Variational HyperAdam. The parameter update vector for network training is sampled from this approximate posterior distribution. Specifically, in Variational HyperAdam, we design a learnable generalized Adam algorithm for estimating expectation, paired with a VarBlock for estimating the variance of the approximate posterior distribution of parameter update vector. The Variational HyperAdam is learned in a meta-learning approach with meta-training loss derived by variational inference. Experiments verify that the learned Variational HyperAdam achieved state-of-the-art network training performance for various types of networks on different datasets, such as multilayer perceptron, CNN, LSTM and ResNet.

SELECTION OF CITATIONS
SEARCH DETAIL