Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
PLoS Comput Biol ; 17(1): e1008552, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33411724

RESUMEN

Dual-reinforcement learning theory proposes behaviour is under the tutelage of a retrospective, value-caching, model-free (MF) system and a prospective-planning, model-based (MB), system. This architecture raises a question as to the degree to which, when devising a plan, a MB controller takes account of influences from its MF counterpart. We present evidence that such a sophisticated self-reflective MB planner incorporates an anticipation of the influences its own MF-proclivities exerts on the execution of its planned future actions. Using a novel bandit task, wherein subjects were periodically allowed to design their environment, we show that reward-assignments were constructed in a manner consistent with a MB system taking account of its MF propensities. Thus, in the task participants assigned higher rewards to bandits that were momentarily associated with stronger MF tendencies. Our findings have implications for a range of decision making domains that includes drug abuse, pre-commitment, and the tension between short and long-term decision horizons in economics.


Asunto(s)
Toma de Decisiones/fisiología , Modelos Psicológicos , Refuerzo en Psicología , Recompensa , Biología Computacional , Humanos , Intención
2.
PLoS Comput Biol ; 15(3): e1006827, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30861001

RESUMEN

Evaluating the future consequences of actions is achievable by simulating a mental search tree into the future. Expanding deep trees, however, is computationally taxing. Therefore, machines and humans use a plan-until-habit scheme that simulates the environment up to a limited depth and then exploits habitual values as proxies for consequences that may arise in the future. Two outstanding questions in this scheme are "in which directions the search tree should be expanded?", and "when should the expansion stop?". Here we propose a principled solution to these questions based on a speed/accuracy tradeoff: deeper expansion in the appropriate directions leads to more accurate planning, but at the cost of slower decision-making. Our simulation results show how this algorithm expands the search tree effectively and efficiently in a grid-world environment. We further show that our algorithm can explain several behavioral patterns in animals and humans, namely the effect of time-pressure on the depth of planning, the effect of reward magnitudes on the direction of planning, and the gradual shift from goal-directed to habitual behavior over the course of training. The algorithm also provides several predictions testable in animal/human experiments.


Asunto(s)
Técnicas de Planificación , Algoritmos , Animales , Conducta de Elección , Humanos , Estudios Prospectivos , Recompensa
3.
PLoS Comput Biol ; 15(2): e1006803, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30759077

RESUMEN

A well-established notion in cognitive neuroscience proposes that multiple brain systems contribute to choice behaviour. These include: (1) a model-free system that uses values cached from the outcome history of alternative actions, and (2) a model-based system that considers action outcomes and the transition structure of the environment. The widespread use of this distinction, across a range of applications, renders it important to index their distinct influences with high reliability. Here we consider the two-stage task, widely considered as a gold standard measure for the contribution of model-based and model-free systems to human choice. We tested the internal/temporal stability of measures from this task, including those estimated via an established computational model, as well as an extended model using drift-diffusion. Drift-diffusion modeling suggested that both choice in the first stage, and RTs in the second stage, are directly affected by a model-based/free trade-off parameter. Both parameter recovery and the stability of model-based estimates were poor but improved substantially when both choice and RT were used (compared to choice only), and when more trials (than conventionally used in research practice) were included in our analysis. The findings have implications for interpretation of past and future studies based on the use of the two-stage task, as well as for characterising the contribution of model-based processes to choice behaviour.


Asunto(s)
Biología Computacional/normas , Toma de Decisiones/fisiología , Modelos Psicológicos , Modelos Estadísticos , Tiempo de Reacción/fisiología , Adolescente , Adulto , Animales , Biología Computacional/métodos , Femenino , Humanos , Masculino , Reproducibilidad de los Resultados , Adulto Joven
4.
Nat Commun ; 10(1): 750, 2019 02 14.
Artículo en Inglés | MEDLINE | ID: mdl-30765718

RESUMEN

An extensive reinforcement learning literature shows that organisms assign credit efficiently, even under conditions of state uncertainty. However, little is known about credit-assignment when state uncertainty is subsequently resolved. Here, we address this problem within the framework of an interaction between model-free (MF) and model-based (MB) control systems. We present and support experimentally a theory of MB retrospective-inference. Within this framework, a MB system resolves uncertainty that prevailed when actions were taken thus guiding an MF credit-assignment. Using a task in which there was initial uncertainty about the lotteries that were chosen, we found that when participants' momentary uncertainty about which lottery had generated an outcome was resolved by provision of subsequent information, participants preferentially assigned credit within a MF system to the lottery they retrospectively inferred was responsible for this outcome. These findings extend our knowledge about the range of MB functions and the scope of system interactions.

5.
PLoS One ; 13(4): e0195399, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29621325

RESUMEN

Every day we make choices under uncertainty; choosing what route to work or which queue in a supermarket to take, for example. It is unclear how outcome variance, e.g. uncertainty about waiting time in a queue, affects decisions and confidence when outcome is stochastic and continuous. How does one evaluate and choose between an option with unreliable but high expected reward, and an option with more certain but lower expected reward? Here we used an experimental design where two choices' payoffs took continuous values, to examine the effect of outcome variance on decision and confidence. We found that our participants' probability of choosing the good (high expected reward) option decreased when the good or the bad options' payoffs were more variable. Their confidence ratings were affected by outcome variability, but only when choosing the good option. Unlike perceptual detection tasks, confidence ratings correlated only weakly with decisions' time, but correlated with the consistency of trial-by-trial choices. Inspired by the satisficing heuristic, we propose a "stochastic satisficing" (SSAT) model for evaluating options with continuous uncertain outcomes. In this model, options are evaluated by their probability of exceeding an acceptability threshold, and confidence reports scale with the chosen option's thus-defined satisficing probability. Participants' decisions were best explained by an expected reward model, while the SSAT model provided the best prediction of decision confidence. We further tested and verified the predictions of this model in a second experiment. Our model and experimental results generalize the models of metacognition from perceptual detection tasks to continuous-value based decisions. Finally, we discuss how the stochastic satisficing account of decision confidence serves psychological and social purposes associated with the evaluation, communication and justification of decision-making.


Asunto(s)
Conducta de Elección , Toma de Decisiones , Incertidumbre , Adulto , Femenino , Humanos , Masculino , Recompensa , Procesos Estocásticos
6.
Eur J Neurosci ; 47(5): 479-487, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29381819

RESUMEN

Goal-directed planning in behavioural and neural sciences is theorized to involve a prospective mental simulation that, starting from the animal's current state in the environment, expands a decision tree in a forward fashion. Backward planning in the artificial intelligence literature, however, suggests that agents expand a mental tree in a backward fashion starting from a certain goal state they have in mind. Here, we show that several behavioural patterns observed in animals and humans, namely outcome-specific Pavlovian-to-instrumental transfer and differential outcome effect, can be parsimoniously explained by backward planning. Our basic assumption is that the presentation of a cue that has been associated with a certain outcome triggers backward planning from that outcome state. On the basis of evidence pointing to forward and backward planning models, we discuss the possibility of brain using a bidirectional planning mechanism where forward and backward trees are expanded in parallel to achieve higher efficiency.


Asunto(s)
Conducta Animal/fisiología , Condicionamiento Clásico , Toma de Decisiones/fisiología , Motivación/fisiología , Animales , Encéfalo/fisiología , Ambiente , Objetivos , Estudios Prospectivos
7.
Nat Hum Behav ; 1(11): 810-818, 2017 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-29152591

RESUMEN

Confidence is the 'feeling of knowing' that accompanies decision making. Bayesian theory proposes that confidence is a function solely of the perceived probability of being correct. Empirical research has suggested, however, that different individuals may perform different computations to estimate confidence from uncertain evidence. To test this hypothesis, we collected confidence reports in a task where subjects made categorical decisions about the mean of a sequence. We found that for most individuals, confidence did indeed reflect the perceived probability of being correct. However, in approximately half of them, confidence also reflected a different probabilistic quantity: the perceived uncertainty in the estimated variable. We found that the contribution of both quantities was stable over weeks. We also observed that the influence of the perceived probability of being correct was stable across two tasks, one perceptual and one cognitive. Overall, our findings provide a computational interpretation of individual differences in human confidence.

8.
PLoS Comput Biol ; 13(9): e1005753, 2017 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-28957319

RESUMEN

Decision-making in the real world presents the challenge of requiring flexible yet prompt behavior, a balance that has been characterized in terms of a trade-off between a slower, prospective goal-directed model-based (MB) strategy and a fast, retrospective habitual model-free (MF) strategy. Theory predicts that flexibility to changes in both reward values and transition contingencies can determine the relative influence of the two systems in reinforcement learning, but few studies have manipulated the latter. Therefore, we developed a novel two-level contingency change task in which transition contingencies between states change every few trials; MB and MF control predict different responses following these contingency changes, allowing their relative influence to be inferred. Additionally, we manipulated the rate of contingency changes in order to determine whether contingency change volatility would play a role in shifting subjects between a MB and MF strategy. We found that human subjects employed a hybrid MB/MF strategy on the task, corroborating the parallel contribution of MB and MF systems in reinforcement learning. Further, subjects did not remain at one level of MB/MF behaviour but rather displayed a shift towards more MB behavior over the first two blocks that was not attributable to the rate of contingency changes but rather to the extent of training. We demonstrate that flexibility to contingency changes can distinguish MB and MF strategies, with human subjects utilizing a hybrid strategy that shifts towards more MB behavior over blocks, consequently corresponding to a higher payoff.


Asunto(s)
Extinción Psicológica/fisiología , Hábitos , Habituación Psicofisiológica/fisiología , Modelos Psicológicos , Refuerzo en Psicología , Análisis y Desempeño de Tareas , Adulto , Toma de Decisiones/fisiología , Femenino , Humanos , Masculino
9.
Curr Opin Neurobiol ; 46: 142-153, 2017 10.
Artículo en Inglés | MEDLINE | ID: mdl-28892737

RESUMEN

Drug addiction is a complex behavioral and neurobiological disorder which, in an emergent brain-circuit view, reflects a loss of prefrontal top-down control over subcortical circuits governing drug-seeking and drug-taking. We first review previous computational accounts of addiction, focusing on cocaine addiction and on prevalent dopamine-based positive-reinforcement and negative-reinforcement computational models. Then, we discuss a recent computational proposal that the progression to addiction is unlikely to result from a complete withdrawal of the goal-oriented decision system in favor the habitual one. Rather, the transition to addiction would arise from a drug-induced alteration in the structure of organismal needs which reorganizes the goal structure, ultimately favoring predominance of drug-oriented goals. Finally, we outline unmet challenges for future computational research on addiction.


Asunto(s)
Conducta Adictiva/fisiopatología , Encéfalo/fisiopatología , Simulación por Computador , Modelos Neurológicos , Trastornos Relacionados con Sustancias/fisiopatología , Animales , Humanos
10.
Curr Biol ; 27(6): 821-832, 2017 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-28285994

RESUMEN

Central to the organization of behavior is the ability to predict the values of outcomes to guide choices. The accuracy of such predictions is honed by a teaching signal that indicates how incorrect a prediction was ("reward prediction error," RPE). In several reinforcement learning contexts, such as Pavlovian conditioning and decisions guided by reward history, this RPE signal is provided by midbrain dopamine neurons. In many situations, however, the stimuli predictive of outcomes are perceptually ambiguous. Perceptual uncertainty is known to influence choices, but it has been unclear whether or how dopamine neurons factor it into their teaching signal. To cope with uncertainty, we extended a reinforcement learning model with a belief state about the perceptually ambiguous stimulus; this model generates an estimate of the probability of choice correctness, termed decision confidence. We show that dopamine responses in monkeys performing a perceptually ambiguous decision task comply with the model's predictions. Consequently, dopamine responses did not simply reflect a stimulus' average expected reward value but were predictive of the trial-to-trial fluctuations in perceptual accuracy. These confidence-dependent dopamine responses emerged prior to monkeys' choice initiation, raising the possibility that dopamine impacts impending decisions, in addition to encoding a post-decision teaching signal. Finally, by manipulating reward size, we found that dopamine neurons reflect both the upcoming reward size and the confidence in achieving it. Together, our results show that dopamine responses convey teaching signals that are also appropriate for perceptual decisions.


Asunto(s)
Conducta de Elección , Toma de Decisiones , Neuronas Dopaminérgicas/fisiología , Macaca/fisiología , Mesencéfalo/fisiología , Percepción , Refuerzo en Psicología , Animales , Dopamina/fisiología , Macaca/psicología , Masculino , Modelos Animales , Recompensa
11.
Psychol Rev ; 124(2): 130-153, 2017 03.
Artículo en Inglés | MEDLINE | ID: mdl-28095003

RESUMEN

Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on a key transition stage of addiction that is well modeled in animals, escalation of drug use, and propose a computational theory of cocaine addiction where cocaine reinforces behavior due to its rapid homeostatic corrective effect, whereas its chronic use induces slow and long-lasting changes in homeostatic setpoint. Simulations show that our new theory accounts for key behavioral and neurobiological features of addiction, most notably, escalation of cocaine use, drug-primed craving and relapse, individual differences underlying dose-response curves, and dopamine D2-receptor downregulation in addicts. The theory also generates unique predictions about cocaine self-administration behavior in rats that are confirmed by new experimental results. Viewing addiction as a homeostatic reinforcement learning disorder coherently explains many behavioral and neurobiological aspects of the transition to cocaine addiction, and suggests a new perspective toward understanding addiction. (PsycINFO Database Record


Asunto(s)
Trastornos Relacionados con Cocaína/psicología , Discapacidades para el Aprendizaje/psicología , Refuerzo en Psicología , Animales , Trastornos Relacionados con Cocaína/metabolismo , Ansia , Homeostasis , Humanos , Teoría Psicológica , Ratas , Receptores de Dopamina D2/metabolismo , Recurrencia , Autoadministración
12.
Proc Natl Acad Sci U S A ; 113(45): 12868-12873, 2016 Nov 08.
Artículo en Inglés | MEDLINE | ID: mdl-27791110

RESUMEN

Behavioral and neural evidence reveal a prospective goal-directed decision process that relies on mental simulation of the environment, and a retrospective habitual process that caches returns previously garnered from available choices. Artificial systems combine the two by simulating the environment up to some depth and then exploiting habitual values as proxies for consequences that may arise in the further future. Using a three-step task, we provide evidence that human subjects use such a normative plan-until-habit strategy, implying a spectrum of approaches that interpolates between habitual and goal-directed responding. We found that increasing time pressure led to shallower goal-directed planning, suggesting that a speed-accuracy tradeoff controls the depth of planning with deeper search leading to more accurate evaluation, at the cost of slower decision-making. We conclude that subjects integrate habit-based cached values directly into goal-directed evaluations in a normative manner.

13.
Elife ; 32014 Dec 02.
Artículo en Inglés | MEDLINE | ID: mdl-25457346

RESUMEN

Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated by internal states. Within this framework, we mathematically prove that seeking rewards is equivalent to the fundamental objective of physiological stability, defining the notion of physiological rationality of behavior. We further suggest a formal basis for temporal discounting of rewards by showing that discounting motivates animals to follow the shortest path in the space of physiological variables toward the desired setpoint. We also explain how animals learn to act predictively to preclude prospective homeostatic challenges, and several other behavioral patterns. Finally, we suggest a computational role for interaction between hypothalamus and the brain reward system.


Asunto(s)
Homeostasis/fisiología , Aprendizaje , Refuerzo en Psicología , Recompensa , Animales , Conducta , Simulación por Computador , Condicionamiento Psicológico , Extinción Psicológica , Modelos Neurológicos , Respuesta de Saciedad
14.
PLoS One ; 8(4): e61489, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23637842

RESUMEN

Despite explicitly wanting to quit, long-term addicts find themselves powerless to resist drugs, despite knowing that drug-taking may be a harmful course of action. Such inconsistency between the explicit knowledge of negative consequences and the compulsive behavioral patterns represents a cognitive/behavioral conflict that is a central characteristic of addiction. Neurobiologically, differential cue-induced activity in distinct striatal subregions, as well as the dopamine connectivity spiraling from ventral striatal regions to the dorsal regions, play critical roles in compulsive drug seeking. However, the functional mechanism that integrates these neuropharmacological observations with the above-mentioned cognitive/behavioral conflict is unknown. Here we provide a formal computational explanation for the drug-induced cognitive inconsistency that is apparent in the addicts' "self-described mistake". We show that addictive drugs gradually produce a motivational bias toward drug-seeking at low-level habitual decision processes, despite the low abstract cognitive valuation of this behavior. This pathology emerges within the hierarchical reinforcement learning framework when chronic exposure to the drug pharmacologically produces pathologicaly persistent phasic dopamine signals. Thereby the drug hijacks the dopaminergic spirals that cascade the reinforcement signals down the ventro-dorsal cortico-striatal hierarchy. Neurobiologically, our theory accounts for rapid development of drug cue-elicited dopamine efflux in the ventral striatum and a delayed response in the dorsal striatum. Our theory also shows how this response pattern depends critically on the dopamine spiraling circuitry. Behaviorally, our framework explains gradual insensitivity of drug-seeking to drug-associated punishments, the blocking phenomenon for drug outcomes, and the persistent preference for drugs over natural rewards by addicts. The model suggests testable predictions and beyond that, sets the stage for a view of addiction as a pathology of hierarchical decision-making processes. This view is complementary to the traditional interpretation of addiction as interaction between habitual and goal-directed decision systems.


Asunto(s)
Conducta Adictiva/psicología , Toma de Decisiones , Dopamina/metabolismo , Modelos Psicológicos , Trastornos Relacionados con Sustancias/psicología , Algoritmos , Ganglios Basales/efectos de los fármacos , Ganglios Basales/metabolismo , Simulación por Computador , Señales (Psicología) , Humanos , Motivación , Refuerzo en Psicología , Recompensa
15.
PLoS Comput Biol ; 7(5): e1002055, 2011 May.
Artículo en Inglés | MEDLINE | ID: mdl-21637741

RESUMEN

Instrumental responses are hypothesized to be of two kinds: habitual and goal-directed, mediated by the sensorimotor and the associative cortico-basal ganglia circuits, respectively. The existence of the two heterogeneous associative learning mechanisms can be hypothesized to arise from the comparative advantages that they have at different stages of learning. In this paper, we assume that the goal-directed system is behaviourally flexible, but slow in choice selection. The habitual system, in contrast, is fast in responding, but inflexible in adapting its behavioural strategy to new conditions. Based on these assumptions and using the computational theory of reinforcement learning, we propose a normative model for arbitration between the two processes that makes an approximately optimal balance between search-time and accuracy in decision making. Behaviourally, the model can explain experimental evidence on behavioural sensitivity to outcome at the early stages of learning, but insensitivity at the later stages. It also explains that when two choices with equal incentive values are available concurrently, the behaviour remains outcome-sensitive, even after extensive training. Moreover, the model can explain choice reaction time variations during the course of learning, as well as the experimental observation that as the number of choices increases, the reaction time also increases. Neurobiologically, by assuming that phasic and tonic activities of midbrain dopamine neurons carry the reward prediction error and the average reward signals used by the model, respectively, the model predicts that whereas phasic dopamine indirectly affects behaviour through reinforcing stimulus-response associations, tonic dopamine can directly affect behaviour through manipulating the competition between the habitual and the goal-directed systems and thus, affect reaction time.


Asunto(s)
Conducta de Elección/fisiología , Toma de Decisiones/fisiología , Aprendizaje/fisiología , Modelos Neurológicos , Algoritmos , Animales , Conducta Animal , Simulación por Computador , Dopamina/fisiología , Objetivos , Humanos , Cadenas de Markov , Aprendizaje por Laberinto , Neuronas/fisiología , Ratas , Refuerzo en Psicología , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA