On the different regimes of stochastic gradient descent.

Sclocchi, Antonio; Wyart, Matthieu

Sclocchi, Antonio; Wyart, Matthieu.

Afiliação

Sclocchi A; Institute of Physics, Ecole Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland.
Wyart M; Institute of Physics, Ecole Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland.

Proc Natl Acad Sci U S A ; 121(9): e2316301121, 2024 Feb 27.

Article em En | MEDLINE | ID: mdl-38377198

ABSTRACT

ABSTRACT

Modern deep networks are trained with stochastic gradient descent (SGD) whose key hyperparameters are the number of data considered at each step or batch size [Formula see text], and the step size or learning rate [Formula see text]. For small [Formula see text] and large [Formula see text], SGD corresponds to a stochastic evolution of the parameters, whose noise amplitude is governed by the "temperature" [Formula see text]. Yet this description is observed to break down for sufficiently large batches [Formula see text], or simplifies to gradient descent (GD) when the temperature is sufficiently small. Understanding where these cross-overs take place remains a central challenge. Here, we resolve these questions for a teacher-student perceptron classification model and show empirically that our key predictions still apply to deep networks. Specifically, we obtain a phase diagram in the [Formula see text]-[Formula see text] plane that separates three dynamical phases i) a noise-dominated SGD governed by temperature, ii) a large-first-step-dominated SGD and iii) GD. These different phases also correspond to different regimes of generalization error. Remarkably, our analysis reveals that the batch size [Formula see text] separating regimes (i) and (ii) scale with the size [Formula see text] of the training set, with an exponent that characterizes the hardness of the classification problem.

Palavras-chave

critical batch size; implicit bias; phase diagram; stochastic gradient descent

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article