ABSTRACT
In a volatile environment where rewards are uncertain, successful performance requires a delicate balance between exploitation of the best option and exploration of alternative choices. It has theoretically been proposed that dopamine contributes to the control of this exploration-exploitation trade-off, specifically that the higher the level of tonic dopamine, the more exploitation is favored. We demonstrate here that there is a formal relationship between the rescaling of dopamine positive reward prediction errors and the exploration-exploitation trade-off in simple non-stationary multi-armed bandit tasks. We further show in rats performing such a task that systemically antagonizing dopamine receptors greatly increases the number of random choices without affecting learning capacities. Simulations and comparison of a set of different computational models (an extended Q-learning model, a directed exploration model, and a meta-learning model) fitted on each individual confirm that, independently of the model, decreasing dopaminergic activity does not affect learning rate but is equivalent to an increase in random exploration rate. This study shows that dopamine could adapt the exploration-exploitation trade-off in decision-making when facing changing environmental contingencies.
Subject(s)
Decision Making , Dopamine Antagonists/pharmacology , Dopamine/chemistry , Exploratory Behavior/physiology , Models, Theoretical , Reward , Animals , Dopamine/metabolism , Exploratory Behavior/drug effects , Male , Probability Learning , Rats , Rats, Long-EvansABSTRACT
The ability to flexibly use knowledge is one cardinal feature of goal-directed behaviors. We recently showed that thalamocortical and corticothalamic pathways connecting the medial prefrontal cortex and the mediodorsal thalamus (MD) contribute to adaptive decision-making (Alcaraz et al., 2018). In this study, we examined the impact of disconnecting the MD from its other main cortical target, the orbitofrontal cortex (OFC) in a task assessing outcome devaluation after initial instrumental training and after reversal of action-outcome contingencies. Crossed MD and OFC lesions did not impair instrumental performance. Using the same approach, we found however that disconnecting the OFC from its other main thalamic afferent, the submedius nucleus, produced a specific impairment in adaptive responding following action-outcome reversal. Altogether, this suggests that multiple thalamocortical circuits may act synergistically to achieve behaviorally relevant functions.
Subject(s)
Adaptation, Psychological , Neural Pathways/physiology , Prefrontal Cortex/physiology , Thalamus/physiology , Animals , Behavior, Animal , Male , Rats, Long-EvansABSTRACT
Highly distributed neural circuits are thought to support adaptive decision-making in volatile and complex environments. Notably, the functional interactions between prefrontal and reciprocally connected thalamic nuclei areas may be important when choices are guided by current goal value or action-outcome contingency. We examined the functional involvement of selected thalamocortical and corticothalamic pathways connecting the dorsomedial prefrontal cortex (dmPFC) and the mediodorsal thalamus (MD) in the behaving rat. Using a chemogenetic approach to inhibit projection-defined dmPFC and MD neurons during an instrumental learning task, we show that thalamocortical and corticothalamic pathways differentially support goal attributes. Both pathways participate in adaptation to the current goal value, but only thalamocortical neurons are required to integrate current causal relationships. These data indicate that antiparallel flow of information within thalamocortical circuits may convey qualitatively distinct aspects of adaptive decision-making and highlight the importance of the direction of information flow within neural circuits.