Predictive Power of a Bayesian Effective Action for Fully Connected One Hidden Layer Neural Networks in the Proportional Limit.

Baglioni, P; Pacelli, R; Aiudi, R; Di Renzo, F; Vezzani, A; Burioni, R; Rotondo, P

Baglioni, P; Pacelli, R; Aiudi, R; Di Renzo, F; Vezzani, A; Burioni, R; Rotondo, P.

Afiliación

Baglioni P; Dipartimento di Scienze Matematiche, Fisiche e Informatiche, <a href="https://ror.org/02k7wn190">Università degli Studi di Parma</a>, Parco Area delle Scienze, 7/A 43124 Parma, Italy.
Pacelli R; <a href="https://ror.org/03xejxm22">INFN</a>, Gruppo Collegato di Parma, Parco Area delle Scienze 7/A, 43124 Parma, Italy.
Aiudi R; Dipartimento di Scienza Applicata e Tecnologia, <a href="https://ror.org/00bgk9508">Politecnico di Torino</a>, 10129 Torino, Italy.
Di Renzo F; Artificial Intelligence Lab, <a href="https://ror.org/05crjpb27">Bocconi University</a>, 20136 Milano, Italy.
Vezzani A; Dipartimento di Scienze Matematiche, Fisiche e Informatiche, <a href="https://ror.org/02k7wn190">Università degli Studi di Parma</a>, Parco Area delle Scienze, 7/A 43124 Parma, Italy.
Burioni R; <a href="https://ror.org/03xejxm22">INFN</a>, Gruppo Collegato di Parma, Parco Area delle Scienze 7/A, 43124 Parma, Italy.
Rotondo P; Dipartimento di Scienze Matematiche, Fisiche e Informatiche, <a href="https://ror.org/02k7wn190">Università degli Studi di Parma</a>, Parco Area delle Scienze, 7/A 43124 Parma, Italy.

Phys Rev Lett ; 133(2): 027301, 2024 Jul 12.

Article en En | MEDLINE | ID: mdl-39073956

ABSTRACT

ABSTRACT

We perform accurate numerical experiments with fully connected one hidden layer neural networks trained with a discretized Langevin dynamics on the MNIST and CIFAR10 datasets. Our goal is to empirically determine the regimes of validity of a recently derived Bayesian effective action for shallow architectures in the proportional limit. We explore the predictive power of the theory as a function of the parameters (the temperature T, the magnitude of the Gaussian priors λ_{1}, λ_{0}, the size of the hidden layer N_{1}, and the size of the training set P) by comparing the experimental and predicted generalization error. The very good agreement between the effective theory and the experiments represents an indication that global rescaling of the infinite-width kernel is a main physical mechanism for kernel renormalization in fully connected Bayesian standard-scaled shallow networks.

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Phys Rev Lett Año: 2024 Tipo del documento: Article País de afiliación: Italia

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google