ABSTRACT
Extinction learning suppresses conditioned reward responses and is thus fundamental to adapt to changing environmental demands and to control excessive reward seeking. The medial prefrontal cortex (mPFC) monitors and controls conditioned reward responses. Abrupt transitions in mPFC activity anticipate changes in conditioned responses to altered contingencies. It remains, however, unknown whether such transitions are driven by the extinction of old behavioral strategies or by the acquisition of new competing ones. Using in vivo multiple single-unit recordings of mPFC in male rats, we studied the relationship between single-unit and population dynamics during extinction learning, using alcohol as a positive reinforcer in an operant conditioning paradigm. To examine the fine temporal relation between neural activity and behavior, we developed a novel behavioral model that allowed us to identify the number, onset, and duration of extinction-learning episodes in the behavior of each animal. We found that single-unit responses to conditioned stimuli changed even under stable experimental conditions and behavior. However, when behavioral responses to task contingencies had to be updated, unit-specific modulations became coordinated across the whole population, pushing the network into a new stable attractor state. Thus, extinction learning is not associated with suppressed mPFC responses to conditioned stimuli, but is anticipated by single-unit coordination into population-wide transitions of the internal state of the animal.SIGNIFICANCE STATEMENT The ability to suppress conditioned behaviors when no longer beneficial is fundamental for the survival of any organism. While pharmacological and optogenetic interventions have shown a critical involvement of the mPFC in the suppression of conditioned responses, the neural dynamics underlying such a process are still largely unknown. Combining novel analysis tools to describe behavior, single-neuron response, and population activity, we found that widespread changes in neuronal firing temporally coordinate across the whole mPFC population in anticipation of behavioral extinction. This coordination leads to a global transition in the internal state of the network, driving extinction of conditioned behavior.
Subject(s)
Behavior, Animal/physiology , Extinction, Psychological/physiology , Prefrontal Cortex/physiology , Reward , Animals , Conditioning, Operant , Learning/physiology , Male , Neurons/physiology , Rats , Rats, WistarABSTRACT
A major tenet in theoretical neuroscience is that cognitive and behavioral processes are ultimately implemented in terms of the neural system dynamics. Accordingly, a major aim for the analysis of neurophysiological measurements should lie in the identification of the computational dynamics underlying task processing. Here we advance a state space model (SSM) based on generative piecewise-linear recurrent neural networks (PLRNN) to assess dynamics from neuroimaging data. In contrast to many other nonlinear time series models which have been proposed for reconstructing latent dynamics, our model is easily interpretable in neural terms, amenable to systematic dynamical systems analysis of the resulting set of equations, and can straightforwardly be transformed into an equivalent continuous-time dynamical system. The major contributions of this paper are the introduction of a new observation model suitable for functional magnetic resonance imaging (fMRI) coupled to the latent PLRNN, an efficient stepwise training procedure that forces the latent model to capture the 'true' underlying dynamics rather than just fitting (or predicting) the observations, and of an empirical measure based on the Kullback-Leibler divergence to evaluate from empirical time series how well this goal of approximating the underlying dynamics has been achieved. We validate and illustrate the power of our approach on simulated 'ground-truth' dynamical systems as well as on experimental fMRI time series, and demonstrate that the learnt dynamics harbors task-related nonlinear structure that a linear dynamical model fails to capture. Given that fMRI is one of the most common techniques for measuring brain activity non-invasively in human subjects, this approach may provide a novel step toward analyzing aberrant (nonlinear) dynamics for clinical assessment or neuroscientific research.
Subject(s)
Magnetic Resonance Imaging/statistics & numerical data , Models, Neurological , Nerve Net/physiology , Algorithms , Brain/diagnostic imaging , Brain/physiology , Computational Biology , Functional Neuroimaging/statistics & numerical data , Humans , Neural Networks, Computer , Nonlinear Dynamics , Systems AnalysisABSTRACT
Supplementing a differential equation with delays results in an infinite-dimensional dynamical system. This property provides the basis for a reservoir computing architecture, where the recurrent neural network is replaced by a single nonlinear node, delay-coupled to itself. Instead of the spatial topology of a network, subunits in the delay-coupled reservoir are multiplexed in time along one delay span of the system. The computational power of the reservoir is contingent on this temporal multiplexing. Here, we learn optimal temporal multiplexing by means of a biologically inspired homeostatic plasticity mechanism. Plasticity acts locally and changes the distances between the subunits along the delay, depending on how responsive these subunits are to the input. After analytically deriving the learning mechanism, we illustrate its role in improving the reservoir's computational power. To this end, we investigate, first, the increase of the reservoir's memory capacity. Second, we predict a NARMA-10 time series, showing that plasticity reduces the normalized root-mean-square error by more than 20%. Third, we discuss plasticity's influence on the reservoir's input-information capacity, the coupling strength between subunits, and the distribution of the readout coefficients.
ABSTRACT
It is a long-established fact that neuronal plasticity occupies the central role in generating neural function and computation. Nevertheless, no unifying account exists of how neurons in a recurrent cortical network learn to compute on temporally and spatially extended stimuli. However, these stimuli constitute the norm, rather than the exception, of the brain's input. Here, we introduce a geometric theory of learning spatiotemporal computations through neuronal plasticity. To that end, we rigorously formulate the problem of neural representations as a relation in space between stimulus-induced neural activity and the asymptotic dynamics of excitable cortical networks. Backed up by computer simulations and numerical analysis, we show that two canonical and widely spread forms of neuronal plasticity, that is, spike-timing-dependent synaptic plasticity and intrinsic plasticity, are both necessary for creating neural representations, such that these computations become realizable. Interestingly, the effects of these forms of plasticity on the emerging neural code relate to properties necessary for both combating and utilizing noise. The neural dynamics also exhibits features of the most likely stimulus in the network's spontaneous activity. These properties of the spatiotemporal neural code resulting from plasticity, having their grounding in nature, further consolidate the biological relevance of our findings.
Subject(s)
Brain/physiology , Neuronal Plasticity/physiology , Neurons/physiology , Algorithms , Animals , Computer Simulation , Homeostasis , Humans , Learning , Memory , Models, Neurological , Models, Statistical , Nerve Net/physiology , Spatio-Temporal Analysis , Synapses/physiologyABSTRACT
The ABA renewal effect occurs when behavior is trained in one context (A), extinguished in a second context (B), and the test occurs in the training context (A). Two mechanisms that explain ABA renewal are context summation at the test and contextual modulation of extinction learning, with the former being unlikely if both contexts have a similar associative history. In two experiments, we used within-subjects designs in which participants learned to avoid a loud noise (unconditioned stimulus) signaled by discrete visual stimuli (conditioned stimuli [CSs]), by pressing the space bar on the computer keyboard. The training was conducted in two contexts, with a different pair of CSs (CS+ and CS-) trained in each context. During extinction, CS+ and CS- stimuli were presented in the alternative context from that of training, and participants were allowed to freely respond, but no loud noise was presented. Finally, all CSs were tested in both contexts, resulting in a within-subjects ABA versus ABB comparison. Across experiments, participants increased avoidance responses during training and decreased them during extinction, although Experiment 2 revealed less extinction. During the test, responding was higher when CS+ were tested in the training context (ABA) versus the extinction context (ABB), revealing the renewal of instrumental avoidance. Experiment 2 also measured expectancy after the avoidance test and revealed a remarkable similarity between avoidance responses and expectancy ratings. This study shows the renewal of instrumental avoidance in humans, and the results suggest the operation of a modulatory role for the context in renewal, similar to the occasion setting of extinction learning by the context. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Subject(s)
Avoidance Learning , Conditioning, Operant , Extinction, Psychological , Humans , Male , Extinction, Psychological/physiology , Female , Avoidance Learning/physiology , Young Adult , Adult , Conditioning, Operant/physiology , Adolescent , Conditioning, Classical/physiologyABSTRACT
Reinforcement learning (RL) is thought to underlie the acquisition of vocal skills like birdsong and speech, where sounding like one's "tutor" is rewarding. However, what RL strategy generates the rich sound inventories for song or speech? We find that the standard actor-critic model of birdsong learning fails to explain juvenile zebra finches' efficient learning of multiple syllables. However, when we replace a single actor with multiple independent actors that jointly maximize a common intrinsic reward, then birds' empirical learning trajectories are accurately reproduced. The influence of each actor (syllable) on the magnitude of global reward is competitively determined by its acoustic similarity to target syllables. This leads to each actor matching the target it is closest to and, occasionally, to the competitive exclusion of an actor from the learning process (i.e., the learned song). We propose that a competitive-cooperative multi-actor RL (MARL) algorithm is key for the efficient learning of the action inventory of a complex skill.
Subject(s)
Finches , Animals , Vocalization, Animal , Learning , Sound , RewardABSTRACT
Time series, as frequently the case in neuroscience, are rarely stationary, but often exhibit abrupt changes due to attractor transitions or bifurcations in the dynamical systems producing them. A plethora of methods for detecting such change points in time series statistics have been developed over the years, in addition to test criteria to evaluate their significance. Issues to consider when developing change point analysis methods include computational demands, difficulties arising from either limited amount of data or a large number of covariates, and arriving at statistical tests with sufficient power to detect as many changes as contained in potentially high-dimensional time series. Here, a general method called Paired Adaptive Regressors for Cumulative Sum is developed for detecting multiple change points in the mean of multivariate time series. The method's advantages over alternative approaches are demonstrated through a series of simulation experiments. This is followed by a real data application to neural recordings from rat medial prefrontal cortex during learning. Finally, the method's flexibility to incorporate useful features from state-of-the-art change point detection techniques is discussed, along with potential drawbacks and suggestions to remedy them.
ABSTRACT
Delays are ubiquitous in biological systems, ranging from genetic regulatory networks and synaptic conductances, to predator/pray population interactions. The evidence is mounting, not only to the presence of delays as physical constraints in signal propagation speed, but also to their functional role in providing dynamical diversity to the systems that comprise them. The latter observation in biological systems inspired the recent development of a computational architecture that harnesses this dynamical diversity, by delay-coupling a single nonlinear element to itself. This architecture is a particular realization of Reservoir Computing, where stimuli are injected into the system in time rather than in space as is the case with classical recurrent neural network realizations. This architecture also exhibits an internal memory which fades in time, an important prerequisite to the functioning of any reservoir computing device. However, fading memory is also a limitation to any computation that requires persistent storage. In order to overcome this limitation, the current work introduces an extended version to the single node Delay-Coupled Reservoir, that is based on trained linear feedback. We show by numerical simulations that adding task-specific linear feedback to the single node Delay-Coupled Reservoir extends the class of solvable tasks to those that require nonfading memory. We demonstrate, through several case studies, the ability of the extended system to carry out complex nonlinear computations that depend on past information, whereas the computational power of the system with fading memory alone quickly deteriorates. Our findings provide the theoretical basis for future physical realizations of a biologically-inspired ultrafast computing device with extended functionality.
Subject(s)
Computer Simulation , Models, Theoretical , Mathematical Computing , Nonlinear Dynamics , Normal DistributionABSTRACT
The behavior and skills of living systems depend on the distributed control provided by specialized and highly recurrent neural networks. Learning and memory in these systems is mediated by a set of adaptation mechanisms, known collectively as neuronal plasticity. Translating principles of recurrent neural control and plasticity to artificial agents has seen major strides, but is usually hampered by the complex interactions between the agent's body and its environment. One of the important standing issues is for the agent to support multiple stable states of behavior, so that its behavioral repertoire matches the requirements imposed by these interactions. The agent also must have the capacity to switch between these states in time scales that are comparable to those by which sensory stimulation varies. Achieving this requires a mechanism of short-term memory that allows the neurocontroller to keep track of the recent history of its input, which finds its biological counterpart in short-term synaptic plasticity. This issue is approached here by deriving synaptic dynamics in recurrent neural networks. Neurons are introduced as self-regulating units with a rich repertoire of dynamics. They exhibit homeostatic properties for certain parameter domains, which result in a set of stable states and the required short-term memory. They can also operate as oscillators, which allow them to surpass the level of activity imposed by their homeostatic operation conditions. Neural systems endowed with the derived synaptic dynamics can be utilized for the neural behavior control of autonomous mobile agents. The resulting behavior depends also on the underlying network structure, which is either engineered or developed by evolutionary techniques. The effectiveness of these self-regulating units is demonstrated by controlling locomotion of a hexapod with 18 degrees of freedom, and obstacle-avoidance of a wheel-driven robot.