A Normative Account of Confirmation Bias During Reinforcement Learning.

Lefebvre, Germain; Summerfield, Christopher; Bogacz, Rafal

Lefebvre, Germain; Summerfield, Christopher; Bogacz, Rafal.

Afiliação

Lefebvre G; MRC Brain Network Dynamics Unit, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DU, U.K. germain.lefebvre@outlook.com.
Summerfield C; Department of Experimental Psychology, University of Oxford, Oxford OX3 9DU, U.K. christopher.summerfield@psy.ox.ac.uk.
Bogacz R; MRC Brain Network Dynamics Unit, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DU, U.K. rafal.bogacz@ndcn.ox.ac.uk.

Neural Comput ; 34(2): 307-337, 2022 01 14.

Article em En | MEDLINE | ID: mdl-34758486

ABSTRACT

ABSTRACT

Reinforcement learning involves updating estimates of the value of states and actions on the basis of experience. Previous work has shown that in humans, reinforcement learning exhibits a confirmatory bias when the value of a chosen option is being updated, estimates are revised more radically following positive than negative reward prediction errors, but the converse is observed when updating the unchosen option value estimate. Here, we simulate performance on a multi-arm bandit task to examine the consequences of a confirmatory bias for reward harvesting. We report a paradoxical

finding:

that confirmatory biases allow the agent to maximize reward relative to an unbiased updating rule. This principle holds over a wide range of experimental settings and is most influential when decisions are corrupted by noise. We show that this occurs because on average, confirmatory biases lead to overestimating the value of more valuable bandits and underestimating the value of less valuable bandits, rendering decisions overall more robust in the face of noise. Our results show how apparently suboptimal learning rules can in fact be reward maximizing if decisions are made with finite computational precision.

Assuntos

Aprendizagem; Reforço Psicológico; Viés; Tomada de Decisões; Humanos; Recompensa

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Reforço Psicológico / Aprendizagem Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: Neural Comput Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google