Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
1.
PLoS One ; 19(4): e0301141, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38557590

RESUMO

Recent advances in the field of machine learning have yielded novel research perspectives in behavioural economics and financial markets microstructure studies. In this paper we study the impact of individual trader leaning characteristics on markets using a stock market simulator designed with a multi-agent architecture. Each agent, representing an autonomous investor, trades stocks through reinforcement learning, using a centralized double-auction limit order book. This approach allows us to study the impact of individual trader traits on the whole stock market at the mesoscale in a bottom-up approach. We chose to test three trader trait aspects: agent learning rate increases, herding behaviour and random trading. As hypothesized, we find that larger learning rates significantly increase the number of crashes. We also find that herding behaviour undermines market stability, while random trading tends to preserve it.


Assuntos
Investimentos em Saúde , Modelos Econômicos , Aprendizado de Máquina , Fenótipo
2.
Biol Psychiatry ; 95(10): 974-984, 2024 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-38101503

RESUMO

BACKGROUND: Drugs like opioids are potent reinforcers thought to co-opt value-based decisions by overshadowing other rewarding outcomes, but how this happens at a neurocomputational level remains elusive. Range adaptation is a canonical process of fine-tuning representations of value based on reward context. Here, we tested whether recent opioid exposure impacts range adaptation in opioid use disorder, potentially explaining why shifting decision making away from drug taking during this vulnerable period is so difficult. METHODS: Participants who had recently (<90 days) used opioids (n = 34) or who had abstained from opioid use for ≥ 90 days (n = 20) and comparison control participants (n = 44) completed a reinforcement learning task designed to induce robust contextual modulation of value. Two models were used to assess the latent process that participants engaged while making their decisions: 1) a Range model that dynamically tracks context and 2) a standard Absolute model that assumes stationary, objective encoding of value. RESULTS: Control participants and ≥90-days-abstinent participants with opioid use disorder exhibited choice patterns consistent with range-adapted valuation. In contrast, participants with recent opioid use were more prone to learn and encode value on an absolute scale. Computational modeling confirmed the behavior of most control participants and ≥90-days-abstinent participants with opioid use disorder (75%), but a minority in the recent use group (38%), was better fit by the Range model than the Absolute model. Furthermore, the degree to which participants relied on range adaptation correlated with duration of continuous abstinence and subjective craving/withdrawal. CONCLUSIONS: Reduced context adaptation to available rewards could explain difficulty deciding about smaller (typically nondrug) rewards in the aftermath of drug exposure.


Assuntos
Transtornos Relacionados ao Uso de Opioides , Reforço Psicológico , Humanos , Masculino , Adulto , Feminino , Recompensa , Adulto Jovem , Tomada de Decisões/efeitos dos fármacos , Tomada de Decisões/fisiologia , Analgésicos Opioides/administração & dosagem , Analgésicos Opioides/farmacologia , Comportamento de Escolha/efeitos dos fármacos , Comportamento de Escolha/fisiologia , Adaptação Psicológica/efeitos dos fármacos , Adaptação Psicológica/fisiologia
3.
Nat Commun ; 14(1): 6534, 2023 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-37848435

RESUMO

Reinforcement-based adaptive decision-making is believed to recruit fronto-striatal circuits. A critical node of the fronto-striatal circuit is the thalamus. However, direct evidence of its involvement in human reinforcement learning is lacking. We address this gap by analyzing intra-thalamic electrophysiological recordings from eight participants while they performed a reinforcement learning task. We found that in both the anterior thalamus (ATN) and dorsomedial thalamus (DMTN), low frequency oscillations (LFO, 4-12 Hz) correlated positively with expected value estimated from computational modeling during reward-based learning (after outcome delivery) or punishment-based learning (during the choice process). Furthermore, LFO recorded from ATN/DMTN were also negatively correlated with outcomes so that both components of reward prediction errors were signaled in the human thalamus. The observed differences in the prediction signals between rewarding and punishing conditions shed light on the neural mechanisms underlying action inhibition in punishment avoidance learning. Our results provide insight into the role of thalamus in reinforcement-based decision-making in humans.


Assuntos
Reforço Psicológico , Recompensa , Humanos , Aprendizagem da Esquiva/fisiologia , Punição , Tálamo
4.
Nat Commun ; 14(1): 6896, 2023 10 28.
Artigo em Inglês | MEDLINE | ID: mdl-37898640

RESUMO

While navigating a fundamentally uncertain world, humans and animals constantly evaluate the probability of their decisions, actions or statements being correct. When explicitly elicited, these confidence estimates typically correlates positively with neural activity in a ventromedial-prefrontal (VMPFC) network and negatively in a dorsolateral and dorsomedial prefrontal network. Here, combining fMRI with a reinforcement-learning paradigm, we leverage the fact that humans are more confident in their choices when seeking gains than avoiding losses to reveal a functional dissociation: whereas the dorsal prefrontal network correlates negatively with a condition-specific confidence signal, the VMPFC network positively encodes task-wide confidence signal incorporating the valence-induced bias. Challenging dominant neuro-computational models, we found that decision-related VMPFC activity better correlates with confidence than with option-values inferred from reinforcement-learning models. Altogether, these results identify the VMPFC as a key node in the neuro-computational architecture that builds global feeling-of-confidence signals from latent decision variables and contextual biases during reinforcement-learning.


Assuntos
Aprendizagem , Córtex Pré-Frontal , Animais , Humanos , Córtex Pré-Frontal/diagnóstico por imagem , Reforço Psicológico , Imageamento por Ressonância Magnética/métodos , Incerteza
5.
Elife ; 122023 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-37428155

RESUMO

Reinforcement learning research in humans and other species indicates that rewards are represented in a context-dependent manner. More specifically, reward representations seem to be normalized as a function of the value of the alternative options. The dominant view postulates that value context-dependence is achieved via a divisive normalization rule, inspired by perceptual decision-making research. However, behavioral and neural evidence points to another plausible mechanism: range normalization. Critically, previous experimental designs were ill-suited to disentangle the divisive and the range normalization accounts, which generate similar behavioral predictions in many circumstances. To address this question, we designed a new learning task where we manipulated, across learning contexts, the number of options and the value ranges. Behavioral and computational analyses falsify the divisive normalization account and rather provide support for the range normalization rule. Together, these results shed new light on the computational mechanisms underlying context-dependence in learning and decision-making.


Assuntos
Tomada de Decisões , Reforço Psicológico , Humanos , Aprendizagem , Recompensa
6.
Psychol Rev ; 130(4): 1017-1043, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37155268

RESUMO

We systematically misjudge our own performance in simple economic tasks. First, we generally overestimate our ability to make correct choices-a bias called overconfidence. Second, we are more confident in our choices when we seek gains than when we try to avoid losses-a bias we refer to as the valence-induced confidence bias. Strikingly, these two biases are also present in reinforcement-learning (RL) contexts, despite the fact that outcomes are provided trial-by-trial and could, in principle, be used to recalibrate confidence judgments online. How confidence biases emerge and are maintained in reinforcement-learning contexts is thus puzzling and still unaccounted for. To explain this paradox, we propose that confidence biases stem from learning biases, and test this hypothesis using data from multiple experiments, where we concomitantly assessed instrumental choices and confidence judgments, during learning and transfer phases. Our results first show that participants' choices in both tasks are best accounted for by a reinforcement-learning model featuring context-dependent learning and confirmatory updating. We then demonstrate that the complex, biased pattern of confidence judgments elicited during both tasks can be explained by an overweighting of the learned value of the chosen option in the computation of confidence judgments. We finally show that, consequently, the individual learning model parameters responsible for the learning biases-confirmatory updating and outcome context-dependency-are predictive of the individual metacognitive biases. We conclude suggesting that the metacognitive biases originate from fundamentally biased learning computations. (PsycInfo Database Record (c) 2023 APA, all rights reserved).


Assuntos
Aprendizagem , Metacognição , Humanos , Cognição , Reforço Psicológico , Viés
7.
Neurosci Biobehav Rev ; 151: 105233, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37196926
8.
Res Sq ; 2023 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-36909645

RESUMO

Recent evidence indicates that reward value encoding in humans is highly context-dependent, leading to suboptimal decisions in some cases. But whether this computational constraint on valuation is a shared feature of human cognition remains unknown. To address this question, we studied the behavior of individuals from across 11 countries of markedly different socioeconomic and cultural makeup using an experimental approach that reliably captures context effects in reinforcement learning. Our findings show that all samples presented evidence of similar sensitivity to context. Crucially, suboptimal decisions generated by context manipulation were not explained by risk aversion, as estimated through a separate description-based choice task (i.e., lotteries) consisting of matched decision offers. Conversely, risk aversion significantly differed across countries. Overall, our findings suggest that context-dependent reward value encoding is a hardcoded feature of human cognition, while description-based decision-making is significantly sensitive to cultural factors.

9.
Commun Biol ; 6(1): 158, 2023 02 08.
Artigo em Inglês | MEDLINE | ID: mdl-36754989

Assuntos
Cognição , Viés
10.
Nat Hum Behav ; 7(4): 611-626, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36604497

RESUMO

Standard models of decision-making assume each option is associated with subjective value, regardless of whether this value is inferred from experience (experiential) or explicitly instructed probabilistic outcomes (symbolic). In this study, we present results that challenge the assumption of unified representation of experiential and symbolic value. Across nine experiments, we presented participants with hybrid decisions between experiential and symbolic options. Participants' choices exhibited a pattern consistent with a systematic neglect of the experiential values. This normatively irrational decision strategy held after accounting for alternative explanations, and persisted even when it bore an economic cost. Overall, our results demonstrate that experiential and symbolic values are not symmetrically considered in hybrid decisions, suggesting they recruit different representational systems that may be assigned different priority levels in the decision process. These findings challenge the dominant models commonly used in value-based decision-making research.

11.
Behav Neurosci ; 137(1): 78-88, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36395020

RESUMO

Do we preferentially learn from outcomes that confirm our choices? In recent years, we investigated this question in a series of studies implementing increasingly complex behavioral protocols. The learning rates fitted in experiments featuring partial or complete feedback, as well as free and forced choices, were systematically found to be consistent with a choice-confirmation bias. One of the prominent behavioral consequences of the confirmatory learning rate pattern is choice hysteresis: that is, the tendency of repeating previous choices, despite contradictory evidence. However, choice-confirmatory pattern of learning rates may spuriously arise from not taking into consideration an explicit choice (gradual) perseveration term in the model. In the present study, we reanalyze data from four published papers (nine experiments; 363 subjects; 126,192 trials), originally included in the studies demonstrating or criticizing the choice-confirmation bias in human participants. We fitted two models: one featured valence-specific updates (i.e., different learning rates for confirmatory and disconfirmatory outcomes) and one additionally including gradual perseveration. Our analysis confirms that the inclusion of the gradual perseveration process in the model significantly reduces the estimated choice-confirmation bias. However, in all considered experiments, the choice-confirmation bias remains present at the meta-analytical level, and significantly different from zero in most experiments. Our results demonstrate that the choice-confirmation bias resists the inclusion of a gradual perseveration term, thus proving to be a robust feature of human reinforcement learning. We conclude by pointing to additional computational processes that may play an important role in estimating and interpreting the computational biases under scrutiny. (PsycInfo Database Record (c) 2023 APA, all rights reserved).


Assuntos
Reforço Psicológico , Humanos , Retroalimentação
12.
Psychol Med ; 53(11): 5256-5266, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-35899867

RESUMO

BACKGROUND: Tourette syndrome (TS) as well as its most common comorbidities are associated with a higher propensity for risky behaviour in everyday life. However, it is unclear whether this increased risk propensity in real-life contexts translates into a generally increased attitude towards risk. We aimed to assess decision-making under risk and ambiguity based on prospect theory by considering the effects of comorbidities and medication. METHODS: Fifty-four individuals with TS and 32 healthy controls performed risk and ambiguity decision-making tasks under both gains and losses conditions. Behavioural and computational parameters were evaluated using (i) univariate analysis to determine parameters difference taking independently; (ii) supervised multivariate analysis to evaluate whether our parameters could jointly account for between-group differences (iii) unsupervised multivariate analysis to explore the potential presence of sub-groups. RESULTS: Except for general 'noisier' (less consistent) decisions in TS, we showed no specific risk-taking behaviour in TS or any relation with tics severity or antipsychotic medication. However, the presence of comorbidities was associated with distortion of decision-making. Specifically, TS with obsessive-compulsive disorder comorbidity was associated with a higher risk-taking profile to increase gain and a higher risk-averse profile to decrease loss. TS with attention-deficit hyperactivity disorder comorbidity was associated with risk-seeking in the ambiguity context to reduce a potential loss. CONCLUSIONS: Impaired valuation of risk and ambiguity was not related to TS per se. Our findings are important for clinical practice: the involvement of individuals with TS in real-life risky situations may actually rather result from other factors such as psychiatric comorbidities.


Assuntos
Transtorno do Deficit de Atenção com Hiperatividade , Transtorno Obsessivo-Compulsivo , Tiques , Síndrome de Tourette , Humanos , Adulto , Síndrome de Tourette/epidemiologia , Síndrome de Tourette/psicologia , Transtorno do Deficit de Atenção com Hiperatividade/psicologia , Tiques/complicações , Tiques/tratamento farmacológico , Transtorno Obsessivo-Compulsivo/psicologia , Comorbidade
13.
Psychol Med ; 53(10): 4696-4706, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-35726513

RESUMO

BACKGROUNDS: Value-based decision-making impairment in depression is a complex phenomenon: while some studies did find evidence of blunted reward learning and reward-related signals in the brain, others indicate no effect. Here we test whether such reward sensitivity deficits are dependent on the overall value of the decision problem. METHODS: We used a two-armed bandit task with two different contexts: one 'rich', one 'poor' where both options were associated with an overall positive, negative expected value, respectively. We tested patients (N = 30) undergoing a major depressive episode and age, gender and socio-economically matched controls (N = 26). Learning performance followed by a transfer phase, without feedback, were analyzed to distangle between a decision or a value-update process mechanism. Finally, we used computational model simulation and fitting to link behavioral patterns to learning biases. RESULTS: Control subjects showed similar learning performance in the 'rich' and the 'poor' contexts, while patients displayed reduced learning in the 'poor' context. Analysis of the transfer phase showed that the context-dependent impairment in patients generalized, suggesting that the effect of depression has to be traced to the outcome encoding. Computational model-based results showed that patients displayed a higher learning rate for negative compared to positive outcomes (the opposite was true in controls). CONCLUSIONS: Our results illustrate that reinforcement learning performances in depression depend on the value of the context. We show that depressive patients have a specific trouble in contexts with an overall negative state value, which in our task is consistent with a negativity bias at the learning rates level.


Assuntos
Depressão , Transtorno Depressivo Maior , Humanos , Reforço Psicológico , Recompensa , Viés
14.
Dev Sci ; 26(3): e13330, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36194156

RESUMO

Understanding how learning changes during human development has been one of the long-standing objectives of developmental science. Recently, advances in computational biology have demonstrated that humans display a bias when learning to navigate novel environments through rewards and punishments: they learn more from outcomes that confirm their expectations than from outcomes that disconfirm them. Here, we ask whether confirmatory learning is stable across development, or whether it might be attenuated in developmental stages in which exploration is beneficial, such as in adolescence. In a reinforcement learning (RL) task, 77 participants aged 11-32 years (four men, mean age = 16.26) attempted to maximize monetary rewards by repeatedly sampling different pairs of novel options, which varied in their reward/punishment probabilities. Mixed-effect models showed an age-related increase in accuracy as long as learning contingencies remained stable across trials, but less so when they reversed halfway through the trials. Age was also associated with a greater tendency to stay with an option that had just delivered a reward, more than to switch away from an option that had just delivered a punishment. At the computational level, a confirmation model provided increasingly better fit with age. This model showed that age differences are captured by decreases in noise or exploration, rather than in the magnitude of the confirmation bias. These findings provide new insights into how learning changes during development and could help better tailor learning environments to people of different ages. RESEARCH HIGHLIGHTS: Reinforcement learning shows age-related improvement during adolescence, but more in stable learning environments compared with volatile learning environments. People tend to stay with an option after a win more than they shift from an option after a loss, and this asymmetry increases with age during adolescence. Computationally, these changes are captured by a developing confirmatory learning style, in which people learn more from outcomes that confirm rather than disconfirm their choices. Age-related differences in confirmatory learning are explained by decreases in stochasticity, rather than changes in the magnitude of the confirmation bias.


Assuntos
Aprendizagem , Reforço Psicológico , Masculino , Humanos , Adolescente , Recompensa , Punição
15.
Trends Cogn Sci ; 26(7): 607-621, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35662490

RESUMO

Humans do not integrate new information objectively: outcomes carrying a positive affective value and evidence confirming one's own prior belief are overweighed. Until recently, theoretical and empirical accounts of the positivity and confirmation biases assumed them to be specific to 'high-level' belief updates. We present evidence against this account. Learning rates in reinforcement learning (RL) tasks, estimated across different contexts and species, generally present the same characteristic asymmetry, suggesting that belief and value updating processes share key computational principles and distortions. This bias generates over-optimistic expectations about the probability of making the right choices and, consequently, generates over-optimistic reward expectations. We discuss the normative and neurobiological roots of these RL biases and their position within the greater picture of behavioral decision-making theories.


Assuntos
Tomada de Decisões , Reforço Psicológico , Viés , Humanos , Aprendizagem , Recompensa
16.
Front Vet Sci ; 9: 853707, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35498733

RESUMO

American Foulbrood (AFB) is a contagious and severe brood disease of honey bees caused by the spore-forming bacterium Paenibacillus larvae. The identification of honey bee colonies infected by P. larvae is crucial for the effective control of AFB. We studied the possibility of identifying the infection levels by P. larvae in honey bee colonies through the examination of powdered sugar samples collected in the hives. The powdered sugar was dusted on the top bars of honeycombs and collected from a sheet paper placed at the bottom of the hive. Three groups of honey bee colonies were examined: Group A1- colonies with clinical symptoms of AFB (n = 11); Group A2 - asymptomatic colonies located in apiaries with colonies showing symptoms of AFB (n = 59); Group B - asymptomatic colonies located in apiaries without cases of the disease (n = 49). The results showed that there was a significant difference in spore counting between Groups and that the spore load in sugar samples was always consistent with the clinical conditions of the colonies and with their belonging to AFB-affected apiaries or not. Based on the obtained results the cultural examination of powdered sugar samples collected from hives could be an effective tool for the quantitative non-destructive assessment of P. larvae infections in honey bee colonies.

17.
J Exp Psychol Learn Mem Cogn ; 48(5): 619-642, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-34516205

RESUMO

Anxiety is a common affective state, characterized by the subjectively unpleasant feelings of dread over an anticipated event. Anxiety is suspected to have important negative consequences on cognition, decision-making, and learning. Yet, despite a recent surge in studies investigating the specific effects of anxiety on reinforcement-learning, no coherent picture has emerged. Here, we investigated the effects of incidental anxiety on instrumental reinforcement-learning, while addressing several issues and defaults identified in a focused literature review. We used a rich experimental design, featuring both a learning and a transfer phase, and a manipulation of outcomes valence (gains vs losses). In two variants (N = 2 × 50) of this experimental paradigm, incidental anxiety was induced with an established threat-of-shock paradigm. Model-free results show that incidental anxiety effects seem limited to a small, but specific increase in postlearning performance measured by a transfer task. A comprehensive modeling effort revealed that, irrespective of the effects of anxiety, individuals give more weight to positive than negative outcomes, and tend to experience the omission of a loss as a gain (and vice versa). However, in line with results from our targeted literature survey, isolating specific computational effects of anxiety on learning per se proved to be challenging. Overall, our results suggest that learning mechanisms are more complex than traditionally presumed, and raise important concerns about the robustness of the effects of anxiety previously identified in simple reinforcement-learning studies. (PsycInfo Database Record (c) 2022 APA, all rights reserved).


Assuntos
Aprendizagem , Reforço Psicológico , Ansiedade , Humanos
18.
Commun Biol ; 4(1): 1271, 2021 11 08.
Artigo em Inglês | MEDLINE | ID: mdl-34750540
19.
Sci Adv ; 7(14)2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33811071

RESUMO

Evidence suggests that economic values are rescaled as a function of the range of the available options. Although locally adaptive, range adaptation has been shown to lead to suboptimal choices, particularly notable in reinforcement learning (RL) situations when options are extrapolated from their original context to a new one. Range adaptation can be seen as the result of an adaptive coding process aiming at increasing the signal-to-noise ratio. However, this hypothesis leads to a counterintuitive prediction: Decreasing task difficulty should increase range adaptation and, consequently, extrapolation errors. Here, we tested the paradoxical relation between range adaptation and performance in a large sample of participants performing variants of an RL task, where we manipulated task difficulty. Results confirmed that range adaptation induces systematic extrapolation errors and is stronger when decreasing task difficulty. Last, we propose a range-adapting model and show that it is able to parsimoniously capture all the behavioral results.

20.
Artigo em Inglês | MEDLINE | ID: mdl-33795209

RESUMO

BACKGROUND: In this study, we asked whether differences in striatal activity during a reinforcement learning (RL) task with gain and loss domains could be one of the earliest functional imaging features associated with carrying the Huntington's disease (HD) gene. Based on previous work, we hypothesized that HD gene carriers would show either neural or behavioral asymmetry between gain and loss learning. METHODS: We recruited 35 HD gene carriers, expected to demonstrate onset of motor symptoms in an average of 26 years, and 35 well-matched gene-negative control subjects. Participants were placed in a functional magnetic resonance imaging scanner, where they completed an RL task in which they were required to learn to choose between abstract stimuli with the aim of gaining rewards and avoiding losses. Task behavior was modeled using an RL model, and variables from this model were used to probe functional magnetic resonance imaging data. RESULTS: In comparison with well-matched control subjects, gene carriers more than 25 years from motor onset showed exaggerated striatal responses to gain-predicting stimuli compared with loss-predicting stimuli (p = .002) in our RL task. Using computational analysis, we also found group differences in striatal representation of stimulus value (p = .0004). We found no group differences in behavior, cognitive scores, or caudate volumes. CONCLUSIONS: Behaviorally, gene carriers 9 years from predicted onset have been shown to learn better from gains than from losses. Our data suggest that a window exists in which HD-related functional neural changes are detectable long before associated behavioral change and 25 years before predicted motor onset. These represent the earliest functional imaging differences between HD gene carriers and control subjects.


Assuntos
Doença de Huntington , Corpo Estriado , Humanos , Doença de Huntington/genética , Imageamento por Ressonância Magnética , Recompensa
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...