Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
Add more filters










Publication year range
1.
bioRxiv ; 2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38585868

ABSTRACT

Lack of cognitive flexibility is a hallmark of substance use disorders and has been associated with drug-induced synaptic plasticity in the dorsomedial striatum (DMS). Yet the possible impact of altered plasticity on real-time striatal neural dynamics during decision-making is unclear. Here, we identified persistent impairments induced by chronic ethanol (EtOH) exposure on cognitive flexibility and striatal decision signals. After a substantial withdrawal period from prior EtOH vapor exposure, male, but not female, rats exhibited reduced adaptability and exploratory behavior during a dynamic decision-making task. Reinforcement learning models showed that prior EtOH exposure enhanced learning from rewards over omissions. Notably, neural signals in the DMS related to the decision outcome were enhanced, while those related to choice and choice-outcome conjunction were reduced, in EtOH-treated rats compared to the controls. These findings highlight the profound impact of chronic EtOH exposure on adaptive decision-making, pinpointing specific changes in striatal representations of actions and outcomes as underlying mechanisms for cognitive deficits.

2.
Addict Neurosci ; 102024 Mar.
Article in English | MEDLINE | ID: mdl-38524664

ABSTRACT

Computational models of addiction often rely on a model-free reinforcement learning (RL) formulation, owing to the close associations between model-free RL, habitual behavior and the dopaminergic system. However, such formulations typically do not capture key recurrent features of addiction phenomena such as craving and relapse. Moreover, they cannot account for goal-directed aspects of addiction that necessitate contrasting, model-based formulations. Here we synthesize a growing body of evidence and propose that a latent-cause framework can help unify our understanding of several recurrent phenomena in addiction, by viewing them as the inferred return of previous, persistent "latent causes". We demonstrate that applying this framework to Pavlovian and instrumental settings can help account for defining features of craving and relapse such as outcome-specificity, generalization, and cyclical dynamics. Finally, we argue that this framework can bridge model-free and model-based formulations, and account for individual variability in phenomenology by accommodating the memories, beliefs, and goals of those living with addiction, motivating a centering of the individual, subjective experience of addiction and recovery.

3.
bioRxiv ; 2023 Jul 21.
Article in English | MEDLINE | ID: mdl-37781610

ABSTRACT

The orbitofrontal cortex (OFC) and hippocampus (HC) are both implicated in forming the cognitive or task maps that support flexible behavior. Previously, we used the dopamine neurons as a sensor or tool to measure the functional effects of OFC lesions (Takahashi et al., 2011). We recorded midbrain dopamine neurons as rats performed an odor-based choice task, in which errors in the prediction of reward were induced by manipulating the number or timing of the expected rewards across blocks of trials. We found that OFC lesions ipsilateral to the recording electrodes caused prediction errors to be degraded consistent with a loss in the resolution of the task states, particularly under conditions where hidden information was critical to sharpening the predictions. Here we have repeated this experiment, along with computational modeling of the results, in rats with ipsilateral HC lesions. The results show HC also shapes the map of our task, however unlike OFC, which provides information local to the trial, the HC appears to be necessary for estimating the upper-level hidden states based on the information that is discontinuous or separated by longer timescales. The results contrast the respective roles of the OFC and HC in cognitive mapping and add to evidence that the dopamine neurons access a rich information set from distributed regions regarding the predictive structure of the environment, potentially enabling this powerful teaching signal to support complex learning and behavior.

4.
Nat Neurosci ; 26(5): 830-839, 2023 05.
Article in English | MEDLINE | ID: mdl-37081296

ABSTRACT

Dopamine neuron activity is tied to the prediction error in temporal difference reinforcement learning models. These models make significant simplifying assumptions, particularly with regard to the structure of the predictions fed into the dopamine neurons, which consist of a single chain of timepoint states. Although this predictive structure can explain error signals observed in many studies, it cannot cope with settings where subjects might infer multiple independent events and outcomes. In the present study, we recorded dopamine neurons in the ventral tegmental area in such a setting to test the validity of the single-stream assumption. Rats were trained in an odor-based choice task, in which the timing and identity of one of several rewards delivered in each trial changed across trial blocks. This design revealed an error signaling pattern that requires the dopamine neurons to access and update multiple independent predictive streams reflecting the subject's belief about timing and potentially unique identities of expected rewards.


Subject(s)
Reinforcement, Psychology , Ventral Tegmental Area , Rats , Animals , Ventral Tegmental Area/physiology , Learning/physiology , Reward , Dopaminergic Neurons/physiology , Dopamine/physiology
5.
Behav Neurosci ; 136(5): 347-348, 2022 Oct.
Article in English | MEDLINE | ID: mdl-36222636

ABSTRACT

This special issue provides a representative snapshot of cutting-edge behavioral neuroscience research on sense of time, cognitive and behavioral functioning, and neural processes. (PsycInfo Database Record (c) 2022 APA, all rights reserved).


Subject(s)
Cognition , Neurosciences
6.
PLoS Comput Biol ; 18(3): e1009897, 2022 03.
Article in English | MEDLINE | ID: mdl-35333867

ABSTRACT

There is no single way to represent a task. Indeed, despite experiencing the same task events and contingencies, different subjects may form distinct task representations. As experimenters, we often assume that subjects represent the task as we envision it. However, such a representation cannot be taken for granted, especially in animal experiments where we cannot deliver explicit instruction regarding the structure of the task. Here, we tested how rats represent an odor-guided choice task in which two odor cues indicated which of two responses would lead to reward, whereas a third odor indicated free choice among the two responses. A parsimonious task representation would allow animals to learn from the forced trials what is the better option to choose in the free-choice trials. However, animals may not necessarily generalize across odors in this way. We fit reinforcement-learning models that use different task representations to trial-by-trial choice behavior of individual rats performing this task, and quantified the degree to which each animal used the more parsimonious representation, generalizing across trial types. Model comparison revealed that most rats did not acquire this representation despite extensive experience. Our results demonstrate the importance of formally testing possible task representations that can afford the observed behavior, rather than assuming that animals' task representations abide by the generative task structure that governs the experimental design.


Subject(s)
Odorants , Reward , Animals , Cues , Generalization, Psychological , Humans , Rats , Reinforcement, Psychology
7.
Neural Netw ; 145: 80-89, 2022 Jan.
Article in English | MEDLINE | ID: mdl-34735893

ABSTRACT

The intersection between neuroscience and artificial intelligence (AI) research has created synergistic effects in both fields. While neuroscientific discoveries have inspired the development of AI architectures, new ideas and algorithms from AI research have produced new ways to study brain mechanisms. A well-known example is the case of reinforcement learning (RL), which has stimulated neuroscience research on how animals learn to adjust their behavior to maximize reward. In this review article, we cover recent collaborative work between the two fields in the context of meta-learning and its extension to social cognition and consciousness. Meta-learning refers to the ability to learn how to learn, such as learning to adjust hyperparameters of existing learning algorithms and how to use existing models and knowledge to efficiently solve new tasks. This meta-learning capability is important for making existing AI systems more adaptive and flexible to efficiently solve new tasks. Since this is one of the areas where there is a gap between human performance and current AI systems, successful collaboration should produce new ideas and progress. Starting from the role of RL algorithms in driving neuroscience, we discuss recent developments in deep RL applied to modeling prefrontal cortex functions. Even from a broader perspective, we discuss the similarities and differences between social cognition and meta-learning, and finally conclude with speculations on the potential links between intelligence as endowed by model-based RL and consciousness. For future work we highlight data efficiency, autonomy and intrinsic motivation as key research areas for advancing both fields.


Subject(s)
Artificial Intelligence , Social Learning , Animals , Brain , Cognition , Consciousness , Humans , Social Cognition
8.
Curr Opin Behav Sci ; 41: 114-121, 2021 Oct.
Article in English | MEDLINE | ID: mdl-36341023

ABSTRACT

Reinforcement learning is a powerful framework for modelling the cognitive and neural substrates of learning and decision making. Contemporary research in cognitive neuroscience and neuroeconomics typically uses value-based reinforcement-learning models, which assume that decision-makers choose by comparing learned values for different actions. However, another possibility is suggested by a simpler family of models, called policy-gradient reinforcement learning. Policy-gradient models learn by optimizing a behavioral policy directly, without the intermediate step of value-learning. Here we review recent behavioral and neural findings that are more parsimoniously explained by policy-gradient models than by value-based models. We conclude that, despite the ubiquity of 'value' in reinforcement-learning models of decision making, policy-gradient models provide a lightweight and compelling alternative model of operant behavior.

10.
Trends Cogn Sci ; 24(7): 499-501, 2020 07.
Article in English | MEDLINE | ID: mdl-32423707

ABSTRACT

Dopamine (DA) responses are synonymous with the 'reward prediction error' of reinforcement learning (RL), and are thought to update neural estimates of expected value. A recent study by Dabney et al. enriches this picture, demonstrating that DA neurons track variability in rewards, providing a readout of risk in the brain.


Subject(s)
Dopamine , Reinforcement, Psychology , Brain , Humans , Learning , Reward
11.
Behav Processes ; 167: 103891, 2019 Oct.
Article in English | MEDLINE | ID: mdl-31381985

ABSTRACT

We review the abstract concept of a 'state' - an internal representation posited by reinforcement learning theories to be used by an agent, whether animal, human or artificial, to summarize the features of the external and internal environment that are relevant for future behavior on a particular task. Armed with this summary representation, an agent can make decisions and perform actions to interact effectively with the world. Here, we review recent findings from the neurobiological and behavioral literature to ask: 'what is a state?' with respect to the internal representations that organize learning and decision making across a range of tasks. We find that state representations include information beyond a straightforward summary of the immediate cues in the environment, providing timing or contextual information from the recent or more distant past, which allows these additional factors to influence decision making and other goal-directed behaviors in complex and perhaps unexpected ways.


Subject(s)
Decision Making , Learning , Reinforcement, Psychology , Animals , Cues , Humans , Psychological Theory , Reward
12.
Psychopharmacology (Berl) ; 236(8): 2543-2556, 2019 Aug.
Article in English | MEDLINE | ID: mdl-31256220

ABSTRACT

RATIONALE: Pairing rewarding outcomes with audiovisual cues in simulated gambling games increases risky choice in both humans and rats. However, the cognitive mechanism through which this sensory enhancement biases decision-making is unknown. OBJECTIVES: To assess the computational mechanisms that promote risky choice during gambling, we applied a series of reinforcement learning models to a large dataset of choices acquired from rats as they each performed one of two variants of a rat gambling task (rGT), in which rewards on "win" trials were delivered either with or without salient audiovisual cues. METHODS: We used a sampling technique based on Markov chain Monte Carlo to obtain posterior estimates of model parameters for a series of RL models of increasing complexity, in order to assess the relative contribution of learning about positive and negative outcomes to the latent valuation of each choice option on the cued and uncued rGT. RESULTS: Rats which develop a preference for the risky options on the rGT substantially down-weight the equivalent cost of the time-out punishments during these tasks. For each model tested, the reduction in learning from the negative time-outs correlated with the degree of risk preference in individual rats. We found no apparent relationship between risk preference and the parameters that govern learning from the positive rewards. CONCLUSIONS: The emergence of risk-preferring choice on the rGT derives from a relative insensitivity to the cost of the time-out punishments, as opposed to a relative hypersensitivity to rewards. This hyposensitivity to punishment is more likely to be induced in individual rats by the addition of salient audiovisual cues to rewards delivered on win trials.


Subject(s)
Cues , Decision Making/physiology , Gambling/psychology , Punishment/psychology , Reward , Animals , Choice Behavior/physiology , Conditioning, Operant/physiology , Humans , Male , Rats , Rats, Long-Evans , Reinforcement, Psychology , Time Factors
13.
J Neurosci ; 39(29): 5740-5749, 2019 07 17.
Article in English | MEDLINE | ID: mdl-31109959

ABSTRACT

Animal studies have shown that the striatal cholinergic system plays a role in behavioral flexibility but, until recently, this system could not be studied in humans due to a lack of appropriate noninvasive techniques. Using proton magnetic resonance spectroscopy, we recently showed that the concentration of dorsal striatal choline (an acetylcholine precursor) changes during reversal learning (a measure of behavioral flexibility) in humans. The aim of the present study was to examine whether regional average striatal choline was associated with reversal learning. A total of 22 participants (mean age = 25.2 years, range = 18-32 years, 13 female) reached learning criterion in a probabilistic learning task with a reversal component. We measured choline at rest in both the dorsal and ventral striatum using magnetic resonance spectroscopy. Task performance was described using a simple reinforcement learning model that dissociates the contributions of positive and negative prediction errors to learning. Average levels of choline in the dorsal striatum were associated with performance during reversal, but not during initial learning. Specifically, lower levels of choline in the dorsal striatum were associated with a lower number of perseverative trials. Moreover, choline levels explained interindividual variance in perseveration over and above that explained by learning from negative prediction errors. These findings suggest that the dorsal striatal cholinergic system plays an important role in behavioral flexibility, in line with evidence from the animal literature and our previous work in humans. Additionally, this work provides further support for the idea of measuring choline with magnetic resonance spectroscopy as a noninvasive way of studying human cholinergic neurochemistry.SIGNIFICANCE STATEMENT Behavioral flexibility is a crucial component of adaptation and survival. Evidence from the animal literature shows that the striatal cholinergic system is fundamental to reversal learning, a key paradigm for studying behavioral flexibility, but this system remains understudied in humans. Using proton magnetic resonance spectroscopy, we showed that choline levels at rest in the dorsal striatum are associated with performance specifically during reversal learning. These novel findings help to bridge the gap between animal and human studies by demonstrating the importance of cholinergic function in the dorsal striatum in human behavioral flexibility. Importantly, the methods described here cannot only be applied to furthering our understanding of healthy human neurochemistry, but also to extending our understanding of cholinergic disorders.


Subject(s)
Corpus Striatum/metabolism , Psychomotor Performance/physiology , Reinforcement, Psychology , Reversal Learning/physiology , Adolescent , Adult , Female , Humans , Magnetic Resonance Imaging/methods , Male , Photic Stimulation/methods , Random Allocation , Young Adult
14.
Curr Opin Neurobiol ; 49: 1-7, 2018 04.
Article in English | MEDLINE | ID: mdl-29096115

ABSTRACT

Phasic dopamine responses are thought to encode a prediction-error signal consistent with model-free reinforcement learning theories. However, a number of recent findings highlight the influence of model-based computations on dopamine responses, and suggest that dopamine prediction errors reflect more dimensions of an expected outcome than scalar reward value. Here, we review a selection of these recent results and discuss the implications and complications of model-based predictions for computational theories of dopamine and learning.


Subject(s)
Computer Simulation , Dopamine/physiology , Learning/physiology , Models, Neurological , Animals , Humans , Reinforcement, Psychology
15.
Neuron ; 94(4): 700-702, 2017 May 17.
Article in English | MEDLINE | ID: mdl-28521123

ABSTRACT

In this issue of Neuron, Murakami et al. (2017) relate neural activity in frontal cortex to stochastic and deterministic components of waiting behavior in rats; they find that mPFC biases waiting time, while M2 is ultimately responsible for trial-to-trial variability in decisions about how long to wait.


Subject(s)
Frontal Lobe , Prefrontal Cortex , Animals , Neurons , Rats
16.
Curr Opin Behav Sci ; 11: 67-73, 2016 Oct.
Article in English | MEDLINE | ID: mdl-27408906

ABSTRACT

To many, the poster child for David Marr's famous three levels of scientific inquiry is reinforcement learning-a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual constraints.

17.
Neuron ; 91(1): 182-93, 2016 07 06.
Article in English | MEDLINE | ID: mdl-27292535

ABSTRACT

Dopamine neurons signal reward prediction errors. This requires accurate reward predictions. It has been suggested that the ventral striatum provides these predictions. Here we tested this hypothesis by recording from putative dopamine neurons in the VTA of rats performing a task in which prediction errors were induced by shifting reward timing or number. In controls, the neurons exhibited error signals in response to both manipulations. However, dopamine neurons in rats with ipsilateral ventral striatal lesions exhibited errors only to changes in number and failed to respond to changes in timing of reward. These results, supported by computational modeling, indicate that predictions about the temporal specificity and the number of expected reward are dissociable and that dopaminergic prediction-error signals rely on the ventral striatum for the former but not the latter.


Subject(s)
Basal Ganglia/physiology , Dopamine/metabolism , Dopaminergic Neurons/metabolism , Reward , Ventral Striatum/physiology , Ventral Tegmental Area/physiology , Animals , Rats, Long-Evans
18.
Trends Cogn Sci ; 19(1): 4-5, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25282675

ABSTRACT

Apparently, the act of free choice confers value: when selecting between an item that you had previously chosen and an identical item that you had been forced to take, the former is often preferred. What could be the neural underpinnings of this free-choice bias in decision making? An elegant study recently published in Neuron suggests that enhanced reward learning in the basal ganglia may be the culprit.


Subject(s)
Choice Behavior/physiology , Learning/physiology , Reinforcement, Psychology , Humans
19.
Front Neurosci ; 7: 255, 2013.
Article in English | MEDLINE | ID: mdl-24399927

ABSTRACT

Vibrotactile discrimination tasks involve perceptual judgements on stimulus pairs separated by a brief interstimulus interval (ISI). Despite their apparent simplicity, decision making during these tasks is biased by prior experience in a manner that is not well understood. A striking example is when participants perform well on trials where the first stimulus is closer to the mean of the stimulus-set than the second stimulus, and perform comparatively poorly when the first stimulus is further from the stimulus mean. This "time-order effect" suggests that participants implicitly encode the mean of the stimulus-set and use this internal standard to bias decisions on any given trial. For relatively short ISIs, the magnitude of the time-order effect typically increases with the distance of the first stimulus from the global mean. Working from the premise that the time-order effect reflects the loss of precision in working memory representations, we predicted that the influence of the time-order effect, and this superimposed "distance" effect, would monotonically increase for trials with longer ISIs. However, by varying the ISI across four intervals (300, 600, 1200, and 2400 ms) we instead found a complex, non-linear dependence of the time-order effect on both the ISI and the distance, with the time-order effect being paradoxically stronger at short ISIs. We also found that this relationship depended strongly on participants' prior experience of the ISI (from previous task titration). The time-order effect not only depends on participants' expectations concerning the distribution of stimuli, but also on the expected timing of the trials.

20.
Phys Rev E Stat Nonlin Soft Matter Phys ; 86(6 Pt 1): 061903, 2012 Dec.
Article in English | MEDLINE | ID: mdl-23367972

ABSTRACT

The minimal integrate-and-fire-or-burst neuron model succinctly describes both tonic firing and postinhibitory rebound bursting of thalamocortical cells in the sensory relay. Networks of integrate-and-fire-or-burst (IFB) neurons with slow inhibitory synaptic interactions have been shown to support stable rhythmic states, including globally synchronous and cluster oscillations, in which network-mediated inhibition cyclically generates bursting in coherent subgroups of neurons. In this paper, we introduce a reduced IFB neuronal population model to study synchronization of inhibition-mediated oscillatory bursting states to periodic excitatory input. Using numeric methods, we demonstrate the existence and stability of 1:1 phase-locked bursting oscillations in the sinusoidally forced IFB neuronal population model. Phase locking is shown to arise when periodic excitation is sufficient to pace the onset of bursting in an IFB cluster without counteracting the inhibitory interactions necessary for burst generation. Phase-locked bursting states are thus found to destabilize when periodic excitation increases in strength or frequency. Further study of the IFB neuronal population model with pulse-like periodic excitatory input illustrates that this synchronization mechanism generalizes to a broad range of n:m phase-locked bursting states across both globally synchronous and clustered oscillatory regimes.


Subject(s)
Biophysics/methods , Neurons/physiology , Thalamus/physiology , Animals , Brain/metabolism , Calcium/metabolism , Computer Simulation , Humans , Models, Neurological , Neurons/metabolism , Oscillometry/methods
SELECTION OF CITATIONS
SEARCH DETAIL