Búsqueda | Portal Regional de la BVS

Distributed Newton Methods for Deep Neural Networks.

Wang, Chien-Chih; Tan, Kent Loong; Chen, Chun-Ting; Lin, Yu-Hsiang; Keerthi, S Sathiya; Mahajan, Dhruv; Sundararajan, S; Lin, Chih-Jen.

Neural Comput ; 30(6): 1673-1724, 2018 06.

Artículo en Inglés | MEDLINE | ID: mdl-29652589

RESUMEN

Deep learning involves a difficult nonconvex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and the synchronization cost may become a bottleneck. In this letter, we focus on situations where the model is distributedly stored and propose a novel distributed Newton method for training deep neural networks. By variable and feature-wise data partitions and some careful designs, we are able to explicitly use the Jacobian matrix for matrix-vector products in the Newton method. Some techniques are incorporated to reduce the running time as well as memory consumption. First, to reduce the communication cost, we propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, we consider subsampled Gauss-Newton matrices for reducing the running time as well as the communication cost. Third, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks. Compared with stochastic gradient methods, it is more robust and may give better test accuracy.

Support vector ordinal regression.

Chu, Wei; Keerthi, S Sathiya.

Neural Comput ; 19(3): 792-815, 2007 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-17298234

RESUMEN

In this letter, we propose two new support vector approaches for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. The size of these optimization problems is linear in the number of training samples. The sequential minimal optimization algorithm is adapted for the resulting optimization problems; it is extremely easy to implement and scales efficiently as a quadratic function of the number of examples. The results of numerical experiments on some benchmark and real-world data sets, including applications of ordinal regression to information retrieval, verify the usefulness of these approaches.

Asunto(s)

Algoritmos , Inteligencia Artificial , Modelos Logísticos , Reconocimiento de Normas Patrones Automatizadas/métodos , Análisis Discriminante , Humanos , Almacenamiento y Recuperación de la Información , Pesos y Medidas

Fast generalized cross-validation algorithm for sparse model learning.

Sundararajan, S; Shevade, Shirish; Keerthi, S Sathiya.

Neural Comput ; 19(1): 283-301, 2007 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-17134326

RESUMEN

We propose a fast, incremental algorithm for designing linear regression models. The proposed algorithm generates a sparse model by optimizing multiple smoothing parameters using the generalized cross-validation approach. The performances on synthetic and real-world data sets are compared with other incremental algorithms such as Tipping and Faul's fast relevance vector machine, Chen et al.'s orthogonal least squares, and Orr's regularized forward selection. The results demonstrate that the proposed algorithm is competitive.

Asunto(s)

Algoritmos , Inteligencia Artificial , Modelos Lineales , Análisis de los Mínimos Cuadrados

An improved conjugate gradient scheme to the solution of least squares SVM.

Chu, Wei; Ong, Chong Jin; Keerthi, S Sathiya.

IEEE Trans Neural Netw ; 16(2): 498-501, 2005 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-15787157

RESUMEN

The least square support vector machines (LS-SVM) formulation corresponds to the solution of a linear system of equations. Several approaches to its numerical solutions have been proposed in the literature. In this letter, we propose an improved method to the numerical solution of LS-SVM and show that the problem can be solved using one reduced system of linear equations. Compared with the existing algorithm for LS-SVM, the approach used in this letter is about twice as efficient. Numerical results using the proposed method are provided for comparisons with other existing algorithms.

Asunto(s)

Análisis de los Mínimos Cuadrados

Bayesian support vector regression using a unified loss function.

Chu, Wei; Keerthi, S Sathiya; Ong, Chong Jin.

IEEE Trans Neural Netw ; 15(1): 29-44, 2004 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-15387245

RESUMEN

In this paper, we use a unified loss function, called the soft insensitive loss function, for Bayesian support vector regression. We follow standard Gaussian processes for regression to set up the Bayesian framework, in which the unified loss function is used in the likelihood evaluation. Under this framework, the maximum a posteriori estimate of the function values corresponds to the solution of an extended support vector regression problem. The overall approach has the merits of support vector regression such as convex quadratic programming and sparsity in solution representation. It also has the advantages of Bayesian methods for model adaptation and error bars of its predictions. Experimental results on simulated and real-world data sets indicate that the approach works well even on large data sets.

Asunto(s)

Teorema de Bayes , Análisis de Regresión

An efficient method for computing leave-one-out error in support vector machines with Gaussian kernels.

Lee, Martin M S; Keerthi, S Sathiya; Ong, Chong Jin; DeCoste, Dennis.

IEEE Trans Neural Netw ; 15(3): 750-7, 2004 May.

Artículo en Inglés | MEDLINE | ID: mdl-15384561

RESUMEN

In this paper, we give an efficient method for computing the leave-one-out (LOO) error for support vector machines (SVMs) with Gaussian kernels quite accurately. It is particularly suitable for iterative decomposition methods of solving SVMs. The importance of various steps of the method is illustrated in detail by showing the performance on six benchmark datasets. The new method often leads to speedups of 10-50 times compared to standard LOO error computation. It has good promise for use in hyperparameter tuning and model comparison

Asunto(s)

Metodologías Computacionales , Distribución Normal , Proyectos de Investigación/estadística & datos numéricos

Asymptotic behaviors of support vector machines with Gaussian kernel.

Keerthi, S Sathiya; Lin, Chih-Jen.

Neural Comput ; 15(7): 1667-89, 2003 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-12816571

RESUMEN

Support vector machines (SVMs) with the gaussian (RBF) kernel have been popular for practical use. Model selection in this class of SVMs involves two hyperparameters: the penalty parameter C and the kernel width sigma. This letter analyzes the behavior of the SVM classifier when these hyperparameters take very small or very large values. Our results help in understanding the hyperparameter space that leads to an efficient heuristic method of searching for hyperparameter values with small generalization errors. The analysis also indicates that if complete model selection using the gaussian kernel has been conducted, there is no need to consider linear SVM.

Asunto(s)

Modelos Teóricos , Distribución Normal

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA