Pesquisa | BVS Doenças Infecciosas e Parasitárias

Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making.

Xu, He A; Modirshanechi, Alireza; Lehmann, Marco P; Gerstner, Wulfram; Herzog, Michael H.

PLoS Comput Biol ; 17(6): e1009070, 2021 06.

Artigo em Inglês | MEDLINE | ID: mdl-34081705

RESUMO

Classic reinforcement learning (RL) theories cannot explain human behavior in the absence of external reward or when the environment changes. Here, we employ a deep sequential decision-making paradigm with sparse reward and abrupt environmental changes. To explain the behavior of human participants in these environments, we show that RL theories need to include surprise and novelty, each with a distinct role. While novelty drives exploration before the first encounter of a reward, surprise increases the rate of learning of a world-model as well as of model-free action-values. Even though the world-model is available for model-based RL, we find that human decisions are dominated by model-free action choices. The world-model is only marginally used for planning, but it is important to detect surprising events. Our theory predicts human action choices with high probability and allows us to dissociate surprise, novelty, and reward in EEG signals.

Assuntos

Adaptação Psicológica , Comportamento Exploratório , Modelos Psicológicos , Reforço Psicológico , Algoritmos , Comportamento de Escolha/fisiologia , Biologia Computacional , Tomada de Decisões/fisiologia , Eletroencefalografia/estatística & dados numéricos , Comportamento Exploratório/fisiologia , Humanos , Aprendizagem/fisiologia , Modelos Neurológicos , Recompensa

One-shot learning and behavioral eligibility traces in sequential decision making.

Lehmann, Marco P; Xu, He A; Liakoni, Vasiliki; Herzog, Michael H; Gerstner, Wulfram; Preuschoff, Kerstin.

Elife ; 82019 11 11.

Artigo em Inglês | MEDLINE | ID: mdl-31709980

RESUMO

In many daily tasks, we make multiple decisions before reaching a goal. In order to learn such sequences of decisions, a mechanism to link earlier actions to later reward is necessary. Reinforcement learning (RL) theory suggests two classes of algorithms solving this credit assignment problem: In classic temporal-difference learning, earlier actions receive reward information only after multiple repetitions of the task, whereas models with eligibility traces reinforce entire sequences of actions from a single experience (one-shot). Here, we show one-shot learning of sequences. We developed a novel paradigm to directly observe which actions and states along a multi-step sequence are reinforced after a single reward. By focusing our analysis on those states for which RL with and without eligibility trace make qualitatively distinct predictions, we find direct behavioral (choice probability) and physiological (pupil dilation) signatures of reinforcement learning with eligibility trace across multiple sensory modalities.

Assuntos

Cognição/fisiologia , Tomada de Decisões/fisiologia , Aprendizagem/fisiologia , Memória/fisiologia , Pupila/fisiologia , Reforço Psicológico , Recompensa , Algoritmos , Humanos , Cadeias de Markov , Modelos Neurológicos , Desempenho Psicomotor/fisiologia

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA