Pesquisa | BVS Doenças Infecciosas e Parasitárias

The roots of polarization in the individual reward system.

Lefebvre, Germain; Deroy, Ophélia; Bahrami, Bahador.

Proc Biol Sci ; 291(2017): 20232011, 2024 Feb 28.

Artigo em Inglês | MEDLINE | ID: mdl-38412967

RESUMO

Polarization raises concerns for democracy and society, which have expanded in the internet era where (mis)information has become ubiquitous, its transmission faster than ever, and the freedom and means of opinion expressions are expanding. The origin of polarization however remains unclear, with multiple social and emotional factors and individual reasoning biases likely to explain its current forms. In the present work, we adopt a principled approach and show that polarization tendencies can take root in biased reward processing of new information in favour of choice confirmatory evidence. Through agent-based simulations, we show that confirmation bias in individual learning is an independent mechanism and could be sufficient for creating polarization at group level independently of any additional assumptions about the opinions themselves, a priori beliefs about them, information transmission mechanisms or the structure of social relationship between individuals. This generative process can interact with polarization mechanisms described elsewhere, but constitutes an entrenched biological tendency that helps explain the extraordinary resilience of polarization against mitigating efforts such as dramatic informational change in the environment.

Assuntos

Emoções , Aprendizagem , Humanos , Relações Interpessoais , Resolução de Problemas , Recompensa

A Normative Account of Confirmation Bias During Reinforcement Learning.

Lefebvre, Germain; Summerfield, Christopher; Bogacz, Rafal.

Neural Comput ; 34(2): 307-337, 2022 01 14.

Artigo em Inglês | MEDLINE | ID: mdl-34758486

RESUMO

Reinforcement learning involves updating estimates of the value of states and actions on the basis of experience. Previous work has shown that in humans, reinforcement learning exhibits a confirmatory bias: when the value of a chosen option is being updated, estimates are revised more radically following positive than negative reward prediction errors, but the converse is observed when updating the unchosen option value estimate. Here, we simulate performance on a multi-arm bandit task to examine the consequences of a confirmatory bias for reward harvesting. We report a paradoxical finding: that confirmatory biases allow the agent to maximize reward relative to an unbiased updating rule. This principle holds over a wide range of experimental settings and is most influential when decisions are corrupted by noise. We show that this occurs because on average, confirmatory biases lead to overestimating the value of more valuable bandits and underestimating the value of less valuable bandits, rendering decisions overall more robust in the face of noise. Our results show how apparently suboptimal learning rules can in fact be reward maximizing if decisions are made with finite computational precision.

Assuntos

Aprendizagem , Reforço Psicológico , Viés , Tomada de Decisões , Humanos , Recompensa

Contrasting temporal difference and opportunity cost reinforcement learning in an empirical money-emergence paradigm.

Lefebvre, Germain; Nioche, Aurélien; Bourgeois-Gironde, Sacha; Palminteri, Stefano.

Proc Natl Acad Sci U S A ; 115(49): E11446-E11454, 2018 12 04.

Artigo em Inglês | MEDLINE | ID: mdl-30442672

RESUMO

Money is a fundamental and ubiquitous institution in modern economies. However, the question of its emergence remains a central one for economists. The monetary search-theoretic approach studies the conditions under which commodity money emerges as a solution to override frictions inherent to interindividual exchanges in a decentralized economy. Although among these conditions, agents' rationality is classically essential and a prerequisite to any theoretical monetary equilibrium, human subjects often fail to adopt optimal strategies in tasks implementing a search-theoretic paradigm when these strategies are speculative, i.e., involve the use of a costly medium of exchange to increase the probability of subsequent and successful trades. In the present work, we hypothesize that implementing such speculative behaviors relies on reinforcement learning instead of lifetime utility calculations, as supposed by classical economic theory. To test this hypothesis, we operationalized the Kiyotaki and Wright paradigm of money emergence in a multistep exchange task and fitted behavioral data regarding human subjects performing this task with two reinforcement learning models. Each of them implements a distinct cognitive hypothesis regarding the weight of future or counterfactual rewards in current decisions. We found that both models outperformed theoretical predictions about subjects' behaviors regarding the implementation of speculative strategies and that the latter relies on the degree of the opportunity costs consideration in the learning process. Speculating about the marketability advantage of money thus seems to depend on mental simulations of counterfactual events that agents are performing in exchange situations.

Assuntos

Comportamento de Escolha , Aprendizagem , Modelos Psicológicos , Reforço Psicológico , Recompensa , Tomada de Decisões , Humanos

Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.

Palminteri, Stefano; Lefebvre, Germain; Kilford, Emma J; Blakemore, Sarah-Jayne.

PLoS Comput Biol ; 13(8): e1005684, 2017 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-28800597

RESUMO

Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.

Assuntos

Tomada de Decisões/fisiologia , Retroalimentação Psicológica/fisiologia , Aprendizagem/fisiologia , Reforço Psicológico , Adulto , Biologia Computacional , Feminino , Humanos , Masculino , Análise e Desempenho de Tarefas , Adulto Jovem

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA