RESUMO
Decisions made by mammals and birds are often temporally extended. They require planning and sampling of decision-relevant information. Our understanding of such decision-making remains in its infancy compared with simpler, forced-choice paradigms. However, recent advances in algorithms supporting planning and information search provide a lens through which we can explain neural and behavioral data in these tasks. We review these advances to obtain a clearer understanding for why planning and curiosity originated in certain species but not others; how activity in the medial temporal lobe, prefrontal and cingulate cortices may support these behaviors; and how planning and information search may complement each other as means to improve future action selection.
Assuntos
Algoritmos , Tomada de Decisões , Neurociências , Animais , HumanosRESUMO
Theories of reward learning in neuroscience have focused on two families of algorithms thought to capture deliberative versus habitual choice. 'Model-based' algorithms compute the value of candidate actions from scratch, whereas 'model-free' algorithms make choice more efficient but less flexible by storing pre-computed action values. We examine an intermediate algorithmic family, the successor representation, which balances flexibility and efficiency by storing partially computed action values: predictions about future events. These pre-computation strategies differ in how they update their choices following changes in a task. The successor representation's reliance on stored predictions about future states predicts a unique signature of insensitivity to changes in the task's sequence of events, but flexible adjustment following changes to rewards. We provide evidence for such differential sensitivity in two behavioural studies with humans. These results suggest that the successor representation is a computational substrate for semi-flexible choice in humans, introducing a subtler, more cognitive notion of habit.
RESUMO
Human decision-making arises from both reflective and reflexive mechanisms, which underpin goal-directed and habitual behavioural control. Computationally, these two systems of behavioural control have been described by different learning algorithms, model-based and model-free learning, respectively. Here, we investigated the effect of diminished serotonin (5-hydroxytryptamine) neurotransmission using dietary tryptophan depletion (TD) in healthy volunteers on the performance of a two-stage decision-making task, which allows discrimination between model-free and model-based behavioural strategies. A novel version of the task was used, which not only examined choice balance for monetary reward but also for punishment (monetary loss). TD impaired goal-directed (model-based) behaviour in the reward condition, but promoted it under punishment. This effect on appetitive and aversive goal-directed behaviour is likely mediated by alteration of the average reward representation produced by TD, which is consistent with previous studies. Overall, the major implication of this study is that serotonin differentially affects goal-directed learning as a function of affective valence. These findings are relevant for a further understanding of psychiatric disorders associated with breakdown of goal-directed behavioural control such as obsessive-compulsive disorders or addictions.
Assuntos
Comportamento de Escolha/fisiologia , Punição , Recompensa , Serotonina/deficiência , Triptofano/deficiência , Adulto , Comportamento Apetitivo/fisiologia , Dieta , Método Duplo-Cego , Análise Fatorial , Feminino , Humanos , Masculino , Modelos Psicológicos , Testes NeuropsicológicosRESUMO
Why do we repeat choices that we know are bad for us? Decision making is characterized by the parallel engagement of two distinct systems, goal-directed and habitual, thought to arise from two computational learning mechanisms, model-based and model-free. The habitual system is a candidate source of pathological fixedness. Using a decision task that measures the contribution to learning of either mechanism, we show a bias towards model-free (habit) acquisition in disorders involving both natural (binge eating) and artificial (methamphetamine) rewards, and obsessive-compulsive disorder. This favoring of model-free learning may underlie the repetitive behaviors that ultimately dominate in these disorders. Further, we show that the habit formation bias is associated with lower gray matter volumes in caudate and medial orbitofrontal cortex. Our findings suggest that the dysfunction in a common neurocomputational mechanism may underlie diverse disorders involving compulsion.