Your browser doesn't support javascript.
loading
Explaining neural scaling laws.
Bahri, Yasaman; Dyer, Ethan; Kaplan, Jared; Lee, Jaehoon; Sharma, Utkarsh.
Afiliação
  • Bahri Y; Google DeepMind, Mountain View, CA 94043.
  • Dyer E; Google DeepMind, Mountain View, CA 94043.
  • Kaplan J; Department of Physics and Astronomy, Johns Hopkins University, Baltimore, MD 21218.
  • Lee J; Google DeepMind, Mountain View, CA 94043.
  • Sharma U; Department of Physics and Astronomy, Johns Hopkins University, Baltimore, MD 21218.
Proc Natl Acad Sci U S A ; 121(27): e2311878121, 2024 Jul 02.
Article em En | MEDLINE | ID: mdl-38913889
ABSTRACT
The population loss of trained deep neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains the origins of and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained from the spectrum of certain kernels, and we present evidence that large width and large dataset resolution-limited scaling exponents are related by a duality. We exhibit all four scaling regimes in the controlled setting of large random feature and pretrained models and test the predictions empirically on a range of standard architectures and datasets. We also observe several empirical relationships between datasets and scaling exponents under modifications of task and architecture aspect ratio. Our work provides a taxonomy for classifying different scaling regimes, underscores that there can be different mechanisms driving improvements in loss, and lends insight into the microscopic origin and relationships between scaling exponents.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Proc Natl Acad Sci U S A Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Proc Natl Acad Sci U S A Ano de publicação: 2024 Tipo de documento: Article