XGrad: Boosting Gradient-Based Optimizers With Weight Prediction.

Guan, Lei; Li, Dongsheng; Shi, Yanqi; Meng, Jian

Guan, Lei; Li, Dongsheng; Shi, Yanqi; Meng, Jian.

IEEE Trans Pattern Anal Mach Intell ; PP2024 Apr 11.

Article en En | MEDLINE | ID: mdl-38602857

ABSTRACT

ABSTRACT

In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models. The code of XGrad will be available at https//github.com/guanleics/XGrad.

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: IEEE Trans Pattern Anal Mach Intell Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google