Búsqueda | Portal Regional de la BVS

Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making.

Xu, He A; Modirshanechi, Alireza; Lehmann, Marco P; Gerstner, Wulfram; Herzog, Michael H.

PLoS Comput Biol ; 17(6): e1009070, 2021 06.

Artículo en Inglés | MEDLINE | ID: mdl-34081705

RESUMEN

Classic reinforcement learning (RL) theories cannot explain human behavior in the absence of external reward or when the environment changes. Here, we employ a deep sequential decision-making paradigm with sparse reward and abrupt environmental changes. To explain the behavior of human participants in these environments, we show that RL theories need to include surprise and novelty, each with a distinct role. While novelty drives exploration before the first encounter of a reward, surprise increases the rate of learning of a world-model as well as of model-free action-values. Even though the world-model is available for model-based RL, we find that human decisions are dominated by model-free action choices. The world-model is only marginally used for planning, but it is important to detect surprising events. Our theory predicts human action choices with high probability and allows us to dissociate surprise, novelty, and reward in EEG signals.

Asunto(s)

Adaptación Psicológica , Conducta Exploratoria , Modelos Psicológicos , Refuerzo en Psicología , Algoritmos , Conducta de Elección/fisiología , Biología Computacional , Toma de Decisiones/fisiología , Electroencefalografía/estadística & datos numéricos , Conducta Exploratoria/fisiología , Humanos , Aprendizaje/fisiología , Modelos Neurológicos , Recompensa

One-shot learning and behavioral eligibility traces in sequential decision making.

Lehmann, Marco P; Xu, He A; Liakoni, Vasiliki; Herzog, Michael H; Gerstner, Wulfram; Preuschoff, Kerstin.

Elife ; 82019 11 11.

Artículo en Inglés | MEDLINE | ID: mdl-31709980

RESUMEN

In many daily tasks, we make multiple decisions before reaching a goal. In order to learn such sequences of decisions, a mechanism to link earlier actions to later reward is necessary. Reinforcement learning (RL) theory suggests two classes of algorithms solving this credit assignment problem: In classic temporal-difference learning, earlier actions receive reward information only after multiple repetitions of the task, whereas models with eligibility traces reinforce entire sequences of actions from a single experience (one-shot). Here, we show one-shot learning of sequences. We developed a novel paradigm to directly observe which actions and states along a multi-step sequence are reinforced after a single reward. By focusing our analysis on those states for which RL with and without eligibility trace make qualitatively distinct predictions, we find direct behavioral (choice probability) and physiological (pupil dilation) signatures of reinforcement learning with eligibility trace across multiple sensory modalities.

Asunto(s)

Cognición/fisiología , Toma de Decisiones/fisiología , Aprendizaje/fisiología , Memoria/fisiología , Pupila/fisiología , Refuerzo en Psicología , Recompensa , Algoritmos , Humanos , Cadenas de Markov , Modelos Neurológicos , Desempeño Psicomotor/fisiología

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA