RESUMO
Recent advances in the field of machine learning have yielded novel research perspectives in behavioural economics and financial markets microstructure studies. In this paper we study the impact of individual trader leaning characteristics on markets using a stock market simulator designed with a multi-agent architecture. Each agent, representing an autonomous investor, trades stocks through reinforcement learning, using a centralized double-auction limit order book. This approach allows us to study the impact of individual trader traits on the whole stock market at the mesoscale in a bottom-up approach. We chose to test three trader trait aspects: agent learning rate increases, herding behaviour and random trading. As hypothesized, we find that larger learning rates significantly increase the number of crashes. We also find that herding behaviour undermines market stability, while random trading tends to preserve it.
Assuntos
Investimentos em Saúde , Modelos Econômicos , Aprendizado de Máquina , FenótipoRESUMO
BACKGROUND: Drugs like opioids are potent reinforcers thought to co-opt value-based decisions by overshadowing other rewarding outcomes, but how this happens at a neurocomputational level remains elusive. Range adaptation is a canonical process of fine-tuning representations of value based on reward context. Here, we tested whether recent opioid exposure impacts range adaptation in opioid use disorder, potentially explaining why shifting decision making away from drug taking during this vulnerable period is so difficult. METHODS: Participants who had recently (<90 days) used opioids (n = 34) or who had abstained from opioid use for ≥ 90 days (n = 20) and comparison control participants (n = 44) completed a reinforcement learning task designed to induce robust contextual modulation of value. Two models were used to assess the latent process that participants engaged while making their decisions: 1) a Range model that dynamically tracks context and 2) a standard Absolute model that assumes stationary, objective encoding of value. RESULTS: Control participants and ≥90-days-abstinent participants with opioid use disorder exhibited choice patterns consistent with range-adapted valuation. In contrast, participants with recent opioid use were more prone to learn and encode value on an absolute scale. Computational modeling confirmed the behavior of most control participants and ≥90-days-abstinent participants with opioid use disorder (75%), but a minority in the recent use group (38%), was better fit by the Range model than the Absolute model. Furthermore, the degree to which participants relied on range adaptation correlated with duration of continuous abstinence and subjective craving/withdrawal. CONCLUSIONS: Reduced context adaptation to available rewards could explain difficulty deciding about smaller (typically nondrug) rewards in the aftermath of drug exposure.
Assuntos
Transtornos Relacionados ao Uso de Opioides , Reforço Psicológico , Humanos , Masculino , Adulto , Feminino , Recompensa , Adulto Jovem , Tomada de Decisões/efeitos dos fármacos , Tomada de Decisões/fisiologia , Analgésicos Opioides/administração & dosagem , Analgésicos Opioides/farmacologia , Comportamento de Escolha/efeitos dos fármacos , Comportamento de Escolha/fisiologia , Adaptação Psicológica/efeitos dos fármacos , Adaptação Psicológica/fisiologiaRESUMO
Reinforcement-based adaptive decision-making is believed to recruit fronto-striatal circuits. A critical node of the fronto-striatal circuit is the thalamus. However, direct evidence of its involvement in human reinforcement learning is lacking. We address this gap by analyzing intra-thalamic electrophysiological recordings from eight participants while they performed a reinforcement learning task. We found that in both the anterior thalamus (ATN) and dorsomedial thalamus (DMTN), low frequency oscillations (LFO, 4-12 Hz) correlated positively with expected value estimated from computational modeling during reward-based learning (after outcome delivery) or punishment-based learning (during the choice process). Furthermore, LFO recorded from ATN/DMTN were also negatively correlated with outcomes so that both components of reward prediction errors were signaled in the human thalamus. The observed differences in the prediction signals between rewarding and punishing conditions shed light on the neural mechanisms underlying action inhibition in punishment avoidance learning. Our results provide insight into the role of thalamus in reinforcement-based decision-making in humans.
Assuntos
Reforço Psicológico , Recompensa , Humanos , Aprendizagem da Esquiva/fisiologia , Punição , TálamoRESUMO
While navigating a fundamentally uncertain world, humans and animals constantly evaluate the probability of their decisions, actions or statements being correct. When explicitly elicited, these confidence estimates typically correlates positively with neural activity in a ventromedial-prefrontal (VMPFC) network and negatively in a dorsolateral and dorsomedial prefrontal network. Here, combining fMRI with a reinforcement-learning paradigm, we leverage the fact that humans are more confident in their choices when seeking gains than avoiding losses to reveal a functional dissociation: whereas the dorsal prefrontal network correlates negatively with a condition-specific confidence signal, the VMPFC network positively encodes task-wide confidence signal incorporating the valence-induced bias. Challenging dominant neuro-computational models, we found that decision-related VMPFC activity better correlates with confidence than with option-values inferred from reinforcement-learning models. Altogether, these results identify the VMPFC as a key node in the neuro-computational architecture that builds global feeling-of-confidence signals from latent decision variables and contextual biases during reinforcement-learning.
Assuntos
Aprendizagem , Córtex Pré-Frontal , Animais , Humanos , Córtex Pré-Frontal/diagnóstico por imagem , Reforço Psicológico , Imageamento por Ressonância Magnética/métodos , IncertezaRESUMO
Reinforcement learning research in humans and other species indicates that rewards are represented in a context-dependent manner. More specifically, reward representations seem to be normalized as a function of the value of the alternative options. The dominant view postulates that value context-dependence is achieved via a divisive normalization rule, inspired by perceptual decision-making research. However, behavioral and neural evidence points to another plausible mechanism: range normalization. Critically, previous experimental designs were ill-suited to disentangle the divisive and the range normalization accounts, which generate similar behavioral predictions in many circumstances. To address this question, we designed a new learning task where we manipulated, across learning contexts, the number of options and the value ranges. Behavioral and computational analyses falsify the divisive normalization account and rather provide support for the range normalization rule. Together, these results shed new light on the computational mechanisms underlying context-dependence in learning and decision-making.
Assuntos
Tomada de Decisões , Reforço Psicológico , Humanos , Aprendizagem , RecompensaRESUMO
We systematically misjudge our own performance in simple economic tasks. First, we generally overestimate our ability to make correct choices-a bias called overconfidence. Second, we are more confident in our choices when we seek gains than when we try to avoid losses-a bias we refer to as the valence-induced confidence bias. Strikingly, these two biases are also present in reinforcement-learning (RL) contexts, despite the fact that outcomes are provided trial-by-trial and could, in principle, be used to recalibrate confidence judgments online. How confidence biases emerge and are maintained in reinforcement-learning contexts is thus puzzling and still unaccounted for. To explain this paradox, we propose that confidence biases stem from learning biases, and test this hypothesis using data from multiple experiments, where we concomitantly assessed instrumental choices and confidence judgments, during learning and transfer phases. Our results first show that participants' choices in both tasks are best accounted for by a reinforcement-learning model featuring context-dependent learning and confirmatory updating. We then demonstrate that the complex, biased pattern of confidence judgments elicited during both tasks can be explained by an overweighting of the learned value of the chosen option in the computation of confidence judgments. We finally show that, consequently, the individual learning model parameters responsible for the learning biases-confirmatory updating and outcome context-dependency-are predictive of the individual metacognitive biases. We conclude suggesting that the metacognitive biases originate from fundamentally biased learning computations. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Assuntos
Aprendizagem , Metacognição , Humanos , Cognição , Reforço Psicológico , ViésRESUMO
Recent evidence indicates that reward value encoding in humans is highly context-dependent, leading to suboptimal decisions in some cases. But whether this computational constraint on valuation is a shared feature of human cognition remains unknown. To address this question, we studied the behavior of individuals from across 11 countries of markedly different socioeconomic and cultural makeup using an experimental approach that reliably captures context effects in reinforcement learning. Our findings show that all samples presented evidence of similar sensitivity to context. Crucially, suboptimal decisions generated by context manipulation were not explained by risk aversion, as estimated through a separate description-based choice task (i.e., lotteries) consisting of matched decision offers. Conversely, risk aversion significantly differed across countries. Overall, our findings suggest that context-dependent reward value encoding is a hardcoded feature of human cognition, while description-based decision-making is significantly sensitive to cultural factors.
RESUMO
Standard models of decision-making assume each option is associated with subjective value, regardless of whether this value is inferred from experience (experiential) or explicitly instructed probabilistic outcomes (symbolic). In this study, we present results that challenge the assumption of unified representation of experiential and symbolic value. Across nine experiments, we presented participants with hybrid decisions between experiential and symbolic options. Participants' choices exhibited a pattern consistent with a systematic neglect of the experiential values. This normatively irrational decision strategy held after accounting for alternative explanations, and persisted even when it bore an economic cost. Overall, our results demonstrate that experiential and symbolic values are not symmetrically considered in hybrid decisions, suggesting they recruit different representational systems that may be assigned different priority levels in the decision process. These findings challenge the dominant models commonly used in value-based decision-making research.
RESUMO
Do we preferentially learn from outcomes that confirm our choices? In recent years, we investigated this question in a series of studies implementing increasingly complex behavioral protocols. The learning rates fitted in experiments featuring partial or complete feedback, as well as free and forced choices, were systematically found to be consistent with a choice-confirmation bias. One of the prominent behavioral consequences of the confirmatory learning rate pattern is choice hysteresis: that is, the tendency of repeating previous choices, despite contradictory evidence. However, choice-confirmatory pattern of learning rates may spuriously arise from not taking into consideration an explicit choice (gradual) perseveration term in the model. In the present study, we reanalyze data from four published papers (nine experiments; 363 subjects; 126,192 trials), originally included in the studies demonstrating or criticizing the choice-confirmation bias in human participants. We fitted two models: one featured valence-specific updates (i.e., different learning rates for confirmatory and disconfirmatory outcomes) and one additionally including gradual perseveration. Our analysis confirms that the inclusion of the gradual perseveration process in the model significantly reduces the estimated choice-confirmation bias. However, in all considered experiments, the choice-confirmation bias remains present at the meta-analytical level, and significantly different from zero in most experiments. Our results demonstrate that the choice-confirmation bias resists the inclusion of a gradual perseveration term, thus proving to be a robust feature of human reinforcement learning. We conclude by pointing to additional computational processes that may play an important role in estimating and interpreting the computational biases under scrutiny. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Assuntos
Reforço Psicológico , Humanos , RetroalimentaçãoRESUMO
BACKGROUND: Tourette syndrome (TS) as well as its most common comorbidities are associated with a higher propensity for risky behaviour in everyday life. However, it is unclear whether this increased risk propensity in real-life contexts translates into a generally increased attitude towards risk. We aimed to assess decision-making under risk and ambiguity based on prospect theory by considering the effects of comorbidities and medication. METHODS: Fifty-four individuals with TS and 32 healthy controls performed risk and ambiguity decision-making tasks under both gains and losses conditions. Behavioural and computational parameters were evaluated using (i) univariate analysis to determine parameters difference taking independently; (ii) supervised multivariate analysis to evaluate whether our parameters could jointly account for between-group differences (iii) unsupervised multivariate analysis to explore the potential presence of sub-groups. RESULTS: Except for general 'noisier' (less consistent) decisions in TS, we showed no specific risk-taking behaviour in TS or any relation with tics severity or antipsychotic medication. However, the presence of comorbidities was associated with distortion of decision-making. Specifically, TS with obsessive-compulsive disorder comorbidity was associated with a higher risk-taking profile to increase gain and a higher risk-averse profile to decrease loss. TS with attention-deficit hyperactivity disorder comorbidity was associated with risk-seeking in the ambiguity context to reduce a potential loss. CONCLUSIONS: Impaired valuation of risk and ambiguity was not related to TS per se. Our findings are important for clinical practice: the involvement of individuals with TS in real-life risky situations may actually rather result from other factors such as psychiatric comorbidities.
Assuntos
Transtorno do Deficit de Atenção com Hiperatividade , Transtorno Obsessivo-Compulsivo , Tiques , Síndrome de Tourette , Humanos , Adulto , Síndrome de Tourette/epidemiologia , Síndrome de Tourette/psicologia , Transtorno do Deficit de Atenção com Hiperatividade/psicologia , Tiques/complicações , Tiques/tratamento farmacológico , Transtorno Obsessivo-Compulsivo/psicologia , ComorbidadeRESUMO
BACKGROUNDS: Value-based decision-making impairment in depression is a complex phenomenon: while some studies did find evidence of blunted reward learning and reward-related signals in the brain, others indicate no effect. Here we test whether such reward sensitivity deficits are dependent on the overall value of the decision problem. METHODS: We used a two-armed bandit task with two different contexts: one 'rich', one 'poor' where both options were associated with an overall positive, negative expected value, respectively. We tested patients (N = 30) undergoing a major depressive episode and age, gender and socio-economically matched controls (N = 26). Learning performance followed by a transfer phase, without feedback, were analyzed to distangle between a decision or a value-update process mechanism. Finally, we used computational model simulation and fitting to link behavioral patterns to learning biases. RESULTS: Control subjects showed similar learning performance in the 'rich' and the 'poor' contexts, while patients displayed reduced learning in the 'poor' context. Analysis of the transfer phase showed that the context-dependent impairment in patients generalized, suggesting that the effect of depression has to be traced to the outcome encoding. Computational model-based results showed that patients displayed a higher learning rate for negative compared to positive outcomes (the opposite was true in controls). CONCLUSIONS: Our results illustrate that reinforcement learning performances in depression depend on the value of the context. We show that depressive patients have a specific trouble in contexts with an overall negative state value, which in our task is consistent with a negativity bias at the learning rates level.
Assuntos
Depressão , Transtorno Depressivo Maior , Humanos , Reforço Psicológico , Recompensa , ViésRESUMO
Understanding how learning changes during human development has been one of the long-standing objectives of developmental science. Recently, advances in computational biology have demonstrated that humans display a bias when learning to navigate novel environments through rewards and punishments: they learn more from outcomes that confirm their expectations than from outcomes that disconfirm them. Here, we ask whether confirmatory learning is stable across development, or whether it might be attenuated in developmental stages in which exploration is beneficial, such as in adolescence. In a reinforcement learning (RL) task, 77 participants aged 11-32 years (four men, mean age = 16.26) attempted to maximize monetary rewards by repeatedly sampling different pairs of novel options, which varied in their reward/punishment probabilities. Mixed-effect models showed an age-related increase in accuracy as long as learning contingencies remained stable across trials, but less so when they reversed halfway through the trials. Age was also associated with a greater tendency to stay with an option that had just delivered a reward, more than to switch away from an option that had just delivered a punishment. At the computational level, a confirmation model provided increasingly better fit with age. This model showed that age differences are captured by decreases in noise or exploration, rather than in the magnitude of the confirmation bias. These findings provide new insights into how learning changes during development and could help better tailor learning environments to people of different ages. RESEARCH HIGHLIGHTS: Reinforcement learning shows age-related improvement during adolescence, but more in stable learning environments compared with volatile learning environments. People tend to stay with an option after a win more than they shift from an option after a loss, and this asymmetry increases with age during adolescence. Computationally, these changes are captured by a developing confirmatory learning style, in which people learn more from outcomes that confirm rather than disconfirm their choices. Age-related differences in confirmatory learning are explained by decreases in stochasticity, rather than changes in the magnitude of the confirmation bias.
Assuntos
Aprendizagem , Reforço Psicológico , Masculino , Humanos , Adolescente , Recompensa , PuniçãoRESUMO
Humans do not integrate new information objectively: outcomes carrying a positive affective value and evidence confirming one's own prior belief are overweighed. Until recently, theoretical and empirical accounts of the positivity and confirmation biases assumed them to be specific to 'high-level' belief updates. We present evidence against this account. Learning rates in reinforcement learning (RL) tasks, estimated across different contexts and species, generally present the same characteristic asymmetry, suggesting that belief and value updating processes share key computational principles and distortions. This bias generates over-optimistic expectations about the probability of making the right choices and, consequently, generates over-optimistic reward expectations. We discuss the normative and neurobiological roots of these RL biases and their position within the greater picture of behavioral decision-making theories.
Assuntos
Tomada de Decisões , Reforço Psicológico , Viés , Humanos , Aprendizagem , RecompensaRESUMO
American Foulbrood (AFB) is a contagious and severe brood disease of honey bees caused by the spore-forming bacterium Paenibacillus larvae. The identification of honey bee colonies infected by P. larvae is crucial for the effective control of AFB. We studied the possibility of identifying the infection levels by P. larvae in honey bee colonies through the examination of powdered sugar samples collected in the hives. The powdered sugar was dusted on the top bars of honeycombs and collected from a sheet paper placed at the bottom of the hive. Three groups of honey bee colonies were examined: Group A1- colonies with clinical symptoms of AFB (n = 11); Group A2 - asymptomatic colonies located in apiaries with colonies showing symptoms of AFB (n = 59); Group B - asymptomatic colonies located in apiaries without cases of the disease (n = 49). The results showed that there was a significant difference in spore counting between Groups and that the spore load in sugar samples was always consistent with the clinical conditions of the colonies and with their belonging to AFB-affected apiaries or not. Based on the obtained results the cultural examination of powdered sugar samples collected from hives could be an effective tool for the quantitative non-destructive assessment of P. larvae infections in honey bee colonies.
RESUMO
Anxiety is a common affective state, characterized by the subjectively unpleasant feelings of dread over an anticipated event. Anxiety is suspected to have important negative consequences on cognition, decision-making, and learning. Yet, despite a recent surge in studies investigating the specific effects of anxiety on reinforcement-learning, no coherent picture has emerged. Here, we investigated the effects of incidental anxiety on instrumental reinforcement-learning, while addressing several issues and defaults identified in a focused literature review. We used a rich experimental design, featuring both a learning and a transfer phase, and a manipulation of outcomes valence (gains vs losses). In two variants (N = 2 × 50) of this experimental paradigm, incidental anxiety was induced with an established threat-of-shock paradigm. Model-free results show that incidental anxiety effects seem limited to a small, but specific increase in postlearning performance measured by a transfer task. A comprehensive modeling effort revealed that, irrespective of the effects of anxiety, individuals give more weight to positive than negative outcomes, and tend to experience the omission of a loss as a gain (and vice versa). However, in line with results from our targeted literature survey, isolating specific computational effects of anxiety on learning per se proved to be challenging. Overall, our results suggest that learning mechanisms are more complex than traditionally presumed, and raise important concerns about the robustness of the effects of anxiety previously identified in simple reinforcement-learning studies. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Assuntos
Aprendizagem , Reforço Psicológico , Ansiedade , HumanosRESUMO
Evidence suggests that economic values are rescaled as a function of the range of the available options. Although locally adaptive, range adaptation has been shown to lead to suboptimal choices, particularly notable in reinforcement learning (RL) situations when options are extrapolated from their original context to a new one. Range adaptation can be seen as the result of an adaptive coding process aiming at increasing the signal-to-noise ratio. However, this hypothesis leads to a counterintuitive prediction: Decreasing task difficulty should increase range adaptation and, consequently, extrapolation errors. Here, we tested the paradoxical relation between range adaptation and performance in a large sample of participants performing variants of an RL task, where we manipulated task difficulty. Results confirmed that range adaptation induces systematic extrapolation errors and is stronger when decreasing task difficulty. Last, we propose a range-adapting model and show that it is able to parsimoniously capture all the behavioral results.
RESUMO
BACKGROUND: In this study, we asked whether differences in striatal activity during a reinforcement learning (RL) task with gain and loss domains could be one of the earliest functional imaging features associated with carrying the Huntington's disease (HD) gene. Based on previous work, we hypothesized that HD gene carriers would show either neural or behavioral asymmetry between gain and loss learning. METHODS: We recruited 35 HD gene carriers, expected to demonstrate onset of motor symptoms in an average of 26 years, and 35 well-matched gene-negative control subjects. Participants were placed in a functional magnetic resonance imaging scanner, where they completed an RL task in which they were required to learn to choose between abstract stimuli with the aim of gaining rewards and avoiding losses. Task behavior was modeled using an RL model, and variables from this model were used to probe functional magnetic resonance imaging data. RESULTS: In comparison with well-matched control subjects, gene carriers more than 25 years from motor onset showed exaggerated striatal responses to gain-predicting stimuli compared with loss-predicting stimuli (p = .002) in our RL task. Using computational analysis, we also found group differences in striatal representation of stimulus value (p = .0004). We found no group differences in behavior, cognitive scores, or caudate volumes. CONCLUSIONS: Behaviorally, gene carriers 9 years from predicted onset have been shown to learn better from gains than from losses. Our data suggest that a window exists in which HD-related functional neural changes are detectable long before associated behavioral change and 25 years before predicted motor onset. These represent the earliest functional imaging differences between HD gene carriers and control subjects.