Pesquisa | Biblioteca Virtual em Saúde

Organizing memories for generalization in complementary learning systems.

Sun, Weinan; Advani, Madhu; Spruston, Nelson; Saxe, Andrew; Fitzgerald, James E.

Nat Neurosci ; 26(8): 1438-1448, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-37474639

RESUMO

Memorization and generalization are complementary cognitive processes that jointly promote adaptive behavior. For example, animals should memorize safe routes to specific water sources and generalize from these memories to discover environmental features that predict new ones. These functions depend on systems consolidation mechanisms that construct neocortical memory traces from hippocampal precursors, but why systems consolidation only applies to a subset of hippocampal memories is unclear. Here we introduce a new neural network formalization of systems consolidation that reveals an overlooked tension-unregulated neocortical memory transfer can cause overfitting and harm generalization in an unpredictable world. We resolve this tension by postulating that memories only consolidate when it aids generalization. This framework accounts for partial hippocampal-cortical memory transfer and provides a normative principle for reconceptualizing numerous observations in the field. Generalization-optimized systems consolidation thus provides new insight into how adaptive behavior benefits from complementary learning systems specialized for memorization and generalization.

Assuntos

Aprendizagem , Consolidação da Memória , Animais , Generalização Psicológica , Hipocampo

New role for circuit expansion for learning in neural networks.

Steinberg, Julia; Advani, Madhu; Sompolinsky, Haim.

Phys Rev E ; 103(2-1): 022404, 2021 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-33736047

RESUMO

Many sensory pathways in the brain include sparsely active populations of neurons downstream from the input stimuli. The biological purpose of this expanded structure is unclear, but it may be beneficial due to the increased expressive power of the network. In this work, we show that certain ways of expanding a neural network can improve its generalization performance even when the expanded structure is pruned after the learning period. To study this setting, we use a teacher-student framework where a perceptron teacher network generates labels corrupted with small amounts of noise. We then train a student network structurally matched to the teacher. In this scenario, the student can achieve optimal accuracy if given the teacher's synaptic weights. We find that sparse expansion of the input layer of a student perceptron network both increases its capacity and improves the generalization performance of the network when learning a noisy rule from a teacher perceptron when the expansion is pruned after learning. We find similar behavior when the expanded units are stochastic and uncorrelated with the input and analyze this network in the mean-field limit. By solving the mean-field equations, we show that the generalization error of the stochastic expanded student network continues to drop as the size of the network increases. This improvement in generalization performance occurs despite the increased complexity of the student network relative to the teacher it is trying to learn. We show that this effect is closely related to the addition of slack variables in artificial neural networks and suggest possible implications for artificial and biological neural networks.

Assuntos

Aprendizagem , Modelos Neurológicos , Rede Nervosa/fisiologia

High-dimensional dynamics of generalization error in neural networks.

Advani, Madhu S; Saxe, Andrew M; Sompolinsky, Haim.

Neural Netw ; 132: 428-446, 2020 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-33022471

RESUMO

We perform an analysis of the average generalization dynamics of large neural networks trained using gradient descent. We study the practically-relevant "high-dimensional" regime where the number of free parameters in the network is on the order of or even larger than the number of examples in the dataset. Using random matrix theory and exact solutions in linear models, we derive the generalization error and training error dynamics of learning and analyze how they depend on the dimensionality of data and signal to noise ratio of the learning problem. We find that the dynamics of gradient descent learning naturally protect against overtraining and overfitting in large networks. Overtraining is worst at intermediate network sizes, when the effective number of free parameters equals the number of samples, and thus can be reduced by making a network smaller or larger. Additionally, in the high-dimensional regime, low generalization error requires starting with small initial weights. We then turn to non-linear neural networks, and show that making networks very large does not harm their generalization performance. On the contrary, it can in fact reduce overtraining, even without early stopping or regularization of any sort. We identify two novel phenomena underlying this behavior in overcomplete models: first, there is a frozen subspace of the weights in which no learning occurs under gradient descent; and second, the statistical properties of the high-dimensional regime yield better-conditioned input correlations which protect against overtraining. We demonstrate that standard application of theories such as Rademacher complexity are inaccurate in predicting the generalization performance of deep neural networks, and derive an alternative bound which incorporates the frozen subspace and conditioning effects and qualitatively matches the behavior observed in simulation.

Assuntos

Aprendizado Profundo

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup.

Goldt, Sebastian; Advani, Madhu S; Saxe, Andrew M; Krzakala, Florent; Zdeborová, Lenka.

J Stat Mech ; 2020(12): 124010, 2020 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-34262607

RESUMO

Deep neural networks achieve stellar generalisation even when they have enough parameters to easily fit all their training data. We study this phenomenon by analysing the dynamics and the performance of over-parameterised two-layer neural networks in the teacher-student setup, where one network, the student, is trained on data generated by another network, called the teacher. We show how the dynamics of stochastic gradient descent (SGD) is captured by a set of differential equations and prove that this description is asymptotically exact in the limit of large inputs. Using this framework, we calculate the final generalisation error of student networks that have more parameters than their teachers. We find that the final generalisation error of the student increases with network size when training only the first layer, but stays constant or even decreases with size when training both layers. We show that these different behaviours have their root in the different solutions SGD finds for different activation functions. Our results indicate that achieving good generalisation in neural networks goes beyond the properties of SGD alone and depends on the interplay of at least the algorithm, the model architecture, and the data set.

Statistical physics of community ecology: a cavity solution to MacArthur's consumer resource model.

Advani, Madhu; Bunin, Guy; Mehta, Pankaj.

J Stat Mech ; 20182018 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-30636966

RESUMO

A central question in ecology is to understand the ecological processes that shape community structure. Niche-based theories have emphasized the important role played by competition for maintaining species diversity. Many of these insights have been derived using MacArthur's consumer resource model (MCRM) or its generalizations. Most theoretical work on the MCRM has focused on small ecosystems with a few species and resources. However theoretical insights derived from small ecosystems many not scale up large ecosystems with many resources and species because large systems with many interacting components often display new emergent behaviors that cannot be understood or deduced from analyzing smaller systems. To address these shortcomings, we develop a statistical physics inspired cavity method to analyze MCRM when both the number of species and the number of resources is large. Unlike previous work in this limit, our theory addresses resource dynamics and resource depletion and demonstrates that species generically and consistently perturb their environments and significantly modify available ecological niches. We show how our cavity approach naturally generalizes niche theory to large ecosystems by accounting for the effect of collective phenomena on species invasion and ecological stability. Our theory suggests that such phenomena are a generic feature of large, natural ecosystems and must be taken into account when analyzing and interpreting community structure. It also highlights the important role that statistical-physics inspired approaches can play in furthering our understanding of ecology.

Position and orientation inference via on-board triangulation.

Advani, Madhu; Weile, Daniel S.

PLoS One ; 12(6): e0180089, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28644874

RESUMO

This work proposes a new approach to determine the spatial location and orientation of an object using measurements performed on the object itself. The on-board triangulation algorithm we outline could be implemented in lieu of, or in addition to, well-known alternatives such as Global Positioning System (GPS) or standard triangulation, since both of these correspond to significantly different geometric pictures and necessitate different hardware and algorithms. We motivate the theory by describing situations in which on-board triangulation would be useful and even preferable to standard methods. The on-board triangulation algorithm we outline involves utilizing dumb beacons which broadcast omnidirectional single frequency radio waves, and smart antenna arrays on the object itself to infer the direction of the beacon signals, which may be used for onboard calculation of the position and orientation of the object. Numerical examples demonstrate the utility of the method and its noise tolerance.

Assuntos

Algoritmos , Radiação Eletromagnética , Simulação por Computador , Sistemas de Informação Geográfica , Microcomputadores

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA