Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
J Neurosci ; 44(36)2024 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-39122558

RESUMEN

The orbitofrontal cortex (OFC) is crucial for tracking various aspects of expected outcomes, thereby helping to guide choices and support learning. Our previous study showed that the effects of reward timing and size on the activity of single units in OFC were dissociable when these attributes were manipulated independently ( Roesch et al., 2006). However, in real-life decision-making scenarios, outcome features often change simultaneously, so here we investigated how OFC neurons in male rats integrate information about the timing and identity (flavor) of reward and respond to changes in these features, according to whether they were changed simultaneously or separately. We found that a substantial number of OFC neurons fired differentially to immediate versus delayed reward and to the different reward flavors. However, contrary to the previous study, selectivity for timing was strongly correlated with selectivity for identity. Taken together with the previous research, these results suggest that when reward features are correlated, OFC tends to "pack" them into unitary constructs, whereas when they are independent, OFC tends to "crack" them into separate constructs. Furthermore, we found that when both reward timing and flavor were changed, reward-responsive OFC neurons showed unique activity patterns preceding and during the omission of an expected reward. Interestingly, this OFC activity is similar and slightly preceded the ventral tegmental area dopamine (VTA DA) activity observed in a previous study ( Takahashi et al., 2023), consistent with the role of OFC in providing predictive information to VTA DA neurons.


Asunto(s)
Neuronas , Corteza Prefrontal , Recompensa , Animales , Masculino , Corteza Prefrontal/fisiología , Ratas , Neuronas/fisiología , Ratas Long-Evans , Conducta de Elección/fisiología
2.
PLoS Comput Biol ; 18(3): e1009897, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35333867

RESUMEN

There is no single way to represent a task. Indeed, despite experiencing the same task events and contingencies, different subjects may form distinct task representations. As experimenters, we often assume that subjects represent the task as we envision it. However, such a representation cannot be taken for granted, especially in animal experiments where we cannot deliver explicit instruction regarding the structure of the task. Here, we tested how rats represent an odor-guided choice task in which two odor cues indicated which of two responses would lead to reward, whereas a third odor indicated free choice among the two responses. A parsimonious task representation would allow animals to learn from the forced trials what is the better option to choose in the free-choice trials. However, animals may not necessarily generalize across odors in this way. We fit reinforcement-learning models that use different task representations to trial-by-trial choice behavior of individual rats performing this task, and quantified the degree to which each animal used the more parsimonious representation, generalizing across trial types. Model comparison revealed that most rats did not acquire this representation despite extensive experience. Our results demonstrate the importance of formally testing possible task representations that can afford the observed behavior, rather than assuming that animals' task representations abide by the generative task structure that governs the experimental design.


Asunto(s)
Odorantes , Recompensa , Animales , Señales (Psicología) , Generalización Psicológica , Humanos , Ratas , Refuerzo en Psicología
3.
J Neurosci ; 39(29): 5740-5749, 2019 07 17.
Artículo en Inglés | MEDLINE | ID: mdl-31109959

RESUMEN

Animal studies have shown that the striatal cholinergic system plays a role in behavioral flexibility but, until recently, this system could not be studied in humans due to a lack of appropriate noninvasive techniques. Using proton magnetic resonance spectroscopy, we recently showed that the concentration of dorsal striatal choline (an acetylcholine precursor) changes during reversal learning (a measure of behavioral flexibility) in humans. The aim of the present study was to examine whether regional average striatal choline was associated with reversal learning. A total of 22 participants (mean age = 25.2 years, range = 18-32 years, 13 female) reached learning criterion in a probabilistic learning task with a reversal component. We measured choline at rest in both the dorsal and ventral striatum using magnetic resonance spectroscopy. Task performance was described using a simple reinforcement learning model that dissociates the contributions of positive and negative prediction errors to learning. Average levels of choline in the dorsal striatum were associated with performance during reversal, but not during initial learning. Specifically, lower levels of choline in the dorsal striatum were associated with a lower number of perseverative trials. Moreover, choline levels explained interindividual variance in perseveration over and above that explained by learning from negative prediction errors. These findings suggest that the dorsal striatal cholinergic system plays an important role in behavioral flexibility, in line with evidence from the animal literature and our previous work in humans. Additionally, this work provides further support for the idea of measuring choline with magnetic resonance spectroscopy as a noninvasive way of studying human cholinergic neurochemistry.SIGNIFICANCE STATEMENT Behavioral flexibility is a crucial component of adaptation and survival. Evidence from the animal literature shows that the striatal cholinergic system is fundamental to reversal learning, a key paradigm for studying behavioral flexibility, but this system remains understudied in humans. Using proton magnetic resonance spectroscopy, we showed that choline levels at rest in the dorsal striatum are associated with performance specifically during reversal learning. These novel findings help to bridge the gap between animal and human studies by demonstrating the importance of cholinergic function in the dorsal striatum in human behavioral flexibility. Importantly, the methods described here cannot only be applied to furthering our understanding of healthy human neurochemistry, but also to extending our understanding of cholinergic disorders.


Asunto(s)
Cuerpo Estriado/metabolismo , Desempeño Psicomotor/fisiología , Refuerzo en Psicología , Aprendizaje Inverso/fisiología , Adolescente , Adulto , Femenino , Humanos , Imagen por Resonancia Magnética/métodos , Masculino , Estimulación Luminosa/métodos , Distribución Aleatoria , Adulto Joven
5.
Addict Neurosci ; 102024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38524664

RESUMEN

Computational models of addiction often rely on a model-free reinforcement learning (RL) formulation, owing to the close associations between model-free RL, habitual behavior and the dopaminergic system. However, such formulations typically do not capture key recurrent features of addiction phenomena such as craving and relapse. Moreover, they cannot account for goal-directed aspects of addiction that necessitate contrasting, model-based formulations. Here we synthesize a growing body of evidence and propose that a latent-cause framework can help unify our understanding of several recurrent phenomena in addiction, by viewing them as the inferred return of previous, persistent "latent causes". We demonstrate that applying this framework to Pavlovian and instrumental settings can help account for defining features of craving and relapse such as outcome-specificity, generalization, and cyclical dynamics. Finally, we argue that this framework can bridge model-free and model-based formulations, and account for individual variability in phenomenology by accommodating the memories, beliefs, and goals of those living with addiction, motivating a centering of the individual, subjective experience of addiction and recovery.

6.
bioRxiv ; 2024 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-38585868

RESUMEN

Lack of cognitive flexibility is a hallmark of substance use disorders and has been associated with drug-induced synaptic plasticity in the dorsomedial striatum (DMS). Yet the possible impact of altered plasticity on real-time striatal neural dynamics during decision-making is unclear. Here, we identified persistent impairments induced by chronic ethanol (EtOH) exposure on cognitive flexibility and striatal decision signals. After a substantial withdrawal period from prior EtOH vapor exposure, male, but not female, rats exhibited reduced adaptability and exploratory behavior during a dynamic decision-making task. Reinforcement learning models showed that prior EtOH exposure enhanced learning from rewards over omissions. Notably, neural signals in the DMS related to the decision outcome were enhanced, while those related to choice and choice-outcome conjunction were reduced, in EtOH-treated rats compared to the controls. These findings highlight the profound impact of chronic EtOH exposure on adaptive decision-making, pinpointing specific changes in striatal representations of actions and outcomes as underlying mechanisms for cognitive deficits.

7.
bioRxiv ; 2023 Jul 21.
Artículo en Inglés | MEDLINE | ID: mdl-37781610

RESUMEN

The orbitofrontal cortex (OFC) and hippocampus (HC) are both implicated in forming the cognitive or task maps that support flexible behavior. Previously, we used the dopamine neurons as a sensor or tool to measure the functional effects of OFC lesions (Takahashi et al., 2011). We recorded midbrain dopamine neurons as rats performed an odor-based choice task, in which errors in the prediction of reward were induced by manipulating the number or timing of the expected rewards across blocks of trials. We found that OFC lesions ipsilateral to the recording electrodes caused prediction errors to be degraded consistent with a loss in the resolution of the task states, particularly under conditions where hidden information was critical to sharpening the predictions. Here we have repeated this experiment, along with computational modeling of the results, in rats with ipsilateral HC lesions. The results show HC also shapes the map of our task, however unlike OFC, which provides information local to the trial, the HC appears to be necessary for estimating the upper-level hidden states based on the information that is discontinuous or separated by longer timescales. The results contrast the respective roles of the OFC and HC in cognitive mapping and add to evidence that the dopamine neurons access a rich information set from distributed regions regarding the predictive structure of the environment, potentially enabling this powerful teaching signal to support complex learning and behavior.

8.
Nat Neurosci ; 26(5): 830-839, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37081296

RESUMEN

Dopamine neuron activity is tied to the prediction error in temporal difference reinforcement learning models. These models make significant simplifying assumptions, particularly with regard to the structure of the predictions fed into the dopamine neurons, which consist of a single chain of timepoint states. Although this predictive structure can explain error signals observed in many studies, it cannot cope with settings where subjects might infer multiple independent events and outcomes. In the present study, we recorded dopamine neurons in the ventral tegmental area in such a setting to test the validity of the single-stream assumption. Rats were trained in an odor-based choice task, in which the timing and identity of one of several rewards delivered in each trial changed across trial blocks. This design revealed an error signaling pattern that requires the dopamine neurons to access and update multiple independent predictive streams reflecting the subject's belief about timing and potentially unique identities of expected rewards.


Asunto(s)
Refuerzo en Psicología , Área Tegmental Ventral , Ratas , Animales , Área Tegmental Ventral/fisiología , Aprendizaje/fisiología , Recompensa , Neuronas Dopaminérgicas/fisiología , Dopamina/fisiología
9.
Behav Neurosci ; 136(5): 347-348, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-36222636

RESUMEN

This special issue provides a representative snapshot of cutting-edge behavioral neuroscience research on sense of time, cognitive and behavioral functioning, and neural processes. (PsycInfo Database Record (c) 2022 APA, all rights reserved).


Asunto(s)
Cognición , Neurociencias
10.
Neural Netw ; 145: 80-89, 2022 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-34735893

RESUMEN

The intersection between neuroscience and artificial intelligence (AI) research has created synergistic effects in both fields. While neuroscientific discoveries have inspired the development of AI architectures, new ideas and algorithms from AI research have produced new ways to study brain mechanisms. A well-known example is the case of reinforcement learning (RL), which has stimulated neuroscience research on how animals learn to adjust their behavior to maximize reward. In this review article, we cover recent collaborative work between the two fields in the context of meta-learning and its extension to social cognition and consciousness. Meta-learning refers to the ability to learn how to learn, such as learning to adjust hyperparameters of existing learning algorithms and how to use existing models and knowledge to efficiently solve new tasks. This meta-learning capability is important for making existing AI systems more adaptive and flexible to efficiently solve new tasks. Since this is one of the areas where there is a gap between human performance and current AI systems, successful collaboration should produce new ideas and progress. Starting from the role of RL algorithms in driving neuroscience, we discuss recent developments in deep RL applied to modeling prefrontal cortex functions. Even from a broader perspective, we discuss the similarities and differences between social cognition and meta-learning, and finally conclude with speculations on the potential links between intelligence as endowed by model-based RL and consciousness. For future work we highlight data efficiency, autonomy and intrinsic motivation as key research areas for advancing both fields.


Asunto(s)
Inteligencia Artificial , Aprendizaje Social , Animales , Encéfalo , Cognición , Estado de Conciencia , Humanos , Cognición Social
11.
Curr Opin Behav Sci ; 41: 114-121, 2021 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-36341023

RESUMEN

Reinforcement learning is a powerful framework for modelling the cognitive and neural substrates of learning and decision making. Contemporary research in cognitive neuroscience and neuroeconomics typically uses value-based reinforcement-learning models, which assume that decision-makers choose by comparing learned values for different actions. However, another possibility is suggested by a simpler family of models, called policy-gradient reinforcement learning. Policy-gradient models learn by optimizing a behavioral policy directly, without the intermediate step of value-learning. Here we review recent behavioral and neural findings that are more parsimoniously explained by policy-gradient models than by value-based models. We conclude that, despite the ubiquity of 'value' in reinforcement-learning models of decision making, policy-gradient models provide a lightweight and compelling alternative model of operant behavior.

12.
Trends Cogn Sci ; 24(7): 499-501, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32423707

RESUMEN

Dopamine (DA) responses are synonymous with the 'reward prediction error' of reinforcement learning (RL), and are thought to update neural estimates of expected value. A recent study by Dabney et al. enriches this picture, demonstrating that DA neurons track variability in rewards, providing a readout of risk in the brain.


Asunto(s)
Dopamina , Refuerzo en Psicología , Encéfalo , Humanos , Aprendizaje , Recompensa
13.
Behav Processes ; 167: 103891, 2019 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-31381985

RESUMEN

We review the abstract concept of a 'state' - an internal representation posited by reinforcement learning theories to be used by an agent, whether animal, human or artificial, to summarize the features of the external and internal environment that are relevant for future behavior on a particular task. Armed with this summary representation, an agent can make decisions and perform actions to interact effectively with the world. Here, we review recent findings from the neurobiological and behavioral literature to ask: 'what is a state?' with respect to the internal representations that organize learning and decision making across a range of tasks. We find that state representations include information beyond a straightforward summary of the immediate cues in the environment, providing timing or contextual information from the recent or more distant past, which allows these additional factors to influence decision making and other goal-directed behaviors in complex and perhaps unexpected ways.


Asunto(s)
Toma de Decisiones , Aprendizaje , Refuerzo en Psicología , Animales , Señales (Psicología) , Humanos , Teoría Psicológica , Recompensa
14.
Psychopharmacology (Berl) ; 236(8): 2543-2556, 2019 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-31256220

RESUMEN

RATIONALE: Pairing rewarding outcomes with audiovisual cues in simulated gambling games increases risky choice in both humans and rats. However, the cognitive mechanism through which this sensory enhancement biases decision-making is unknown. OBJECTIVES: To assess the computational mechanisms that promote risky choice during gambling, we applied a series of reinforcement learning models to a large dataset of choices acquired from rats as they each performed one of two variants of a rat gambling task (rGT), in which rewards on "win" trials were delivered either with or without salient audiovisual cues. METHODS: We used a sampling technique based on Markov chain Monte Carlo to obtain posterior estimates of model parameters for a series of RL models of increasing complexity, in order to assess the relative contribution of learning about positive and negative outcomes to the latent valuation of each choice option on the cued and uncued rGT. RESULTS: Rats which develop a preference for the risky options on the rGT substantially down-weight the equivalent cost of the time-out punishments during these tasks. For each model tested, the reduction in learning from the negative time-outs correlated with the degree of risk preference in individual rats. We found no apparent relationship between risk preference and the parameters that govern learning from the positive rewards. CONCLUSIONS: The emergence of risk-preferring choice on the rGT derives from a relative insensitivity to the cost of the time-out punishments, as opposed to a relative hypersensitivity to rewards. This hyposensitivity to punishment is more likely to be induced in individual rats by the addition of salient audiovisual cues to rewards delivered on win trials.


Asunto(s)
Señales (Psicología) , Toma de Decisiones/fisiología , Juego de Azar/psicología , Castigo/psicología , Recompensa , Animales , Conducta de Elección/fisiología , Condicionamiento Operante/fisiología , Humanos , Masculino , Ratas , Ratas Long-Evans , Refuerzo en Psicología , Factores de Tiempo
15.
Curr Opin Neurobiol ; 49: 1-7, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29096115

RESUMEN

Phasic dopamine responses are thought to encode a prediction-error signal consistent with model-free reinforcement learning theories. However, a number of recent findings highlight the influence of model-based computations on dopamine responses, and suggest that dopamine prediction errors reflect more dimensions of an expected outcome than scalar reward value. Here, we review a selection of these recent results and discuss the implications and complications of model-based predictions for computational theories of dopamine and learning.


Asunto(s)
Simulación por Computador , Dopamina/fisiología , Aprendizaje/fisiología , Modelos Neurológicos , Animales , Humanos , Refuerzo en Psicología
16.
Neuron ; 94(4): 700-702, 2017 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-28521123

RESUMEN

In this issue of Neuron, Murakami et al. (2017) relate neural activity in frontal cortex to stochastic and deterministic components of waiting behavior in rats; they find that mPFC biases waiting time, while M2 is ultimately responsible for trial-to-trial variability in decisions about how long to wait.


Asunto(s)
Lóbulo Frontal , Corteza Prefrontal , Animales , Neuronas , Ratas
17.
Curr Opin Behav Sci ; 11: 67-73, 2016 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-27408906

RESUMEN

To many, the poster child for David Marr's famous three levels of scientific inquiry is reinforcement learning-a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual constraints.

18.
Neuron ; 91(1): 182-93, 2016 07 06.
Artículo en Inglés | MEDLINE | ID: mdl-27292535

RESUMEN

Dopamine neurons signal reward prediction errors. This requires accurate reward predictions. It has been suggested that the ventral striatum provides these predictions. Here we tested this hypothesis by recording from putative dopamine neurons in the VTA of rats performing a task in which prediction errors were induced by shifting reward timing or number. In controls, the neurons exhibited error signals in response to both manipulations. However, dopamine neurons in rats with ipsilateral ventral striatal lesions exhibited errors only to changes in number and failed to respond to changes in timing of reward. These results, supported by computational modeling, indicate that predictions about the temporal specificity and the number of expected reward are dissociable and that dopaminergic prediction-error signals rely on the ventral striatum for the former but not the latter.


Asunto(s)
Ganglios Basales/fisiología , Dopamina/metabolismo , Neuronas Dopaminérgicas/metabolismo , Recompensa , Estriado Ventral/fisiología , Área Tegmental Ventral/fisiología , Animales , Ratas Long-Evans
19.
Trends Cogn Sci ; 19(1): 4-5, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25282675

RESUMEN

Apparently, the act of free choice confers value: when selecting between an item that you had previously chosen and an identical item that you had been forced to take, the former is often preferred. What could be the neural underpinnings of this free-choice bias in decision making? An elegant study recently published in Neuron suggests that enhanced reward learning in the basal ganglia may be the culprit.


Asunto(s)
Conducta de Elección/fisiología , Aprendizaje/fisiología , Refuerzo en Psicología , Humanos
20.
Front Neurosci ; 7: 255, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24399927

RESUMEN

Vibrotactile discrimination tasks involve perceptual judgements on stimulus pairs separated by a brief interstimulus interval (ISI). Despite their apparent simplicity, decision making during these tasks is biased by prior experience in a manner that is not well understood. A striking example is when participants perform well on trials where the first stimulus is closer to the mean of the stimulus-set than the second stimulus, and perform comparatively poorly when the first stimulus is further from the stimulus mean. This "time-order effect" suggests that participants implicitly encode the mean of the stimulus-set and use this internal standard to bias decisions on any given trial. For relatively short ISIs, the magnitude of the time-order effect typically increases with the distance of the first stimulus from the global mean. Working from the premise that the time-order effect reflects the loss of precision in working memory representations, we predicted that the influence of the time-order effect, and this superimposed "distance" effect, would monotonically increase for trials with longer ISIs. However, by varying the ISI across four intervals (300, 600, 1200, and 2400 ms) we instead found a complex, non-linear dependence of the time-order effect on both the ISI and the distance, with the time-order effect being paradoxically stronger at short ISIs. We also found that this relationship depended strongly on participants' prior experience of the ISI (from previous task titration). The time-order effect not only depends on participants' expectations concerning the distribution of stimuli, but also on the expected timing of the trials.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA