Search | VHL Regional Portal

Placing Approach-Avoidance Conflict Within the Framework of Multi-objective Reinforcement Learning.

Enkhtaivan, Enkhzaya; Nishimura, Joel; Cochran, Amy.

Bull Math Biol ; 85(11): 116, 2023 10 14.

Article in English | MEDLINE | ID: mdl-37837562

ABSTRACT

Many psychiatric disorders are marked by impaired decision-making during an approach-avoidance conflict. Current experiments elicit approach-avoidance conflicts in bandit tasks by pairing an individual's actions with consequences that are simultaneously desirable (reward) and undesirable (harm). We frame approach-avoidance conflict tasks as a multi-objective multi-armed bandit. By defining a general decision-maker as a limiting sequence of actions, we disentangle the decision process from learning. Each decision maker can then be identified as a multi-dimensional point representing its long-term average expected outcomes, while different decision making models can be associated by the geometry of their 'feasible region', the set of all possible long term performances on a fixed task. We introduce three example decision-makers based on popular reinforcement learning models and characterize their feasible regions, including whether they can be Pareto optimal. From this perspective, we find that existing tasks are unable to distinguish between the three examples of decision-makers. We show how to design new tasks whose geometric structure can be used to better distinguish between decision-makers. These findings are expected to guide the design of approach-avoidance conflict tasks and the modeling of resulting decision-making behavior.

Subject(s)

Decision Making , Mathematical Concepts , Humans , Models, Biological , Learning , Reward

A Competition of Critics in Human Decision-Making.

Enkhtaivan, Enkhzaya; Nishimura, Joel; Ly, Cheng; Cochran, Amy L.

Comput Psychiatr ; 5(1): 81-101, 2021.

Article in English | MEDLINE | ID: mdl-38773993

ABSTRACT

Recent experiments and theories of human decision-making suggest positive and negative errors are processed and encoded differently by serotonin and dopamine, with serotonin possibly serving to oppose dopamine and protect against risky decisions. We introduce a temporal difference (TD) model of human decision-making to account for these features. Our model involves two critics, an optimistic learning system and a pessimistic learning system, whose predictions are integrated in time to control how potential decisions compete to be selected. Our model predicts that human decision-making can be decomposed along two dimensions: the degree to which the individual is sensitive to (1) risk and (2) uncertainty. In addition, we demonstrate that the model can learn about the mean and standard deviation of rewards, and provide information about reaction time despite not modeling these variables directly. Lastly, we simulate a recent experiment to show how updates of the two learning systems could relate to dopamine and serotonin transients, thereby providing a mathematical formalism to serotonin's hypothesized role as an opponent to dopamine. This new model should be useful for future experiments on human decision-making.

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL