Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 62
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Psychol Med ; 53(10): 4696-4706, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-35726513

RESUMEN

BACKGROUNDS: Value-based decision-making impairment in depression is a complex phenomenon: while some studies did find evidence of blunted reward learning and reward-related signals in the brain, others indicate no effect. Here we test whether such reward sensitivity deficits are dependent on the overall value of the decision problem. METHODS: We used a two-armed bandit task with two different contexts: one 'rich', one 'poor' where both options were associated with an overall positive, negative expected value, respectively. We tested patients (N = 30) undergoing a major depressive episode and age, gender and socio-economically matched controls (N = 26). Learning performance followed by a transfer phase, without feedback, were analyzed to distangle between a decision or a value-update process mechanism. Finally, we used computational model simulation and fitting to link behavioral patterns to learning biases. RESULTS: Control subjects showed similar learning performance in the 'rich' and the 'poor' contexts, while patients displayed reduced learning in the 'poor' context. Analysis of the transfer phase showed that the context-dependent impairment in patients generalized, suggesting that the effect of depression has to be traced to the outcome encoding. Computational model-based results showed that patients displayed a higher learning rate for negative compared to positive outcomes (the opposite was true in controls). CONCLUSIONS: Our results illustrate that reinforcement learning performances in depression depend on the value of the context. We show that depressive patients have a specific trouble in contexts with an overall negative state value, which in our task is consistent with a negativity bias at the learning rates level.


Asunto(s)
Depresión , Trastorno Depresivo Mayor , Humanos , Refuerzo en Psicología , Recompensa , Sesgo
2.
Psychol Med ; 53(11): 5256-5266, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-35899867

RESUMEN

BACKGROUND: Tourette syndrome (TS) as well as its most common comorbidities are associated with a higher propensity for risky behaviour in everyday life. However, it is unclear whether this increased risk propensity in real-life contexts translates into a generally increased attitude towards risk. We aimed to assess decision-making under risk and ambiguity based on prospect theory by considering the effects of comorbidities and medication. METHODS: Fifty-four individuals with TS and 32 healthy controls performed risk and ambiguity decision-making tasks under both gains and losses conditions. Behavioural and computational parameters were evaluated using (i) univariate analysis to determine parameters difference taking independently; (ii) supervised multivariate analysis to evaluate whether our parameters could jointly account for between-group differences (iii) unsupervised multivariate analysis to explore the potential presence of sub-groups. RESULTS: Except for general 'noisier' (less consistent) decisions in TS, we showed no specific risk-taking behaviour in TS or any relation with tics severity or antipsychotic medication. However, the presence of comorbidities was associated with distortion of decision-making. Specifically, TS with obsessive-compulsive disorder comorbidity was associated with a higher risk-taking profile to increase gain and a higher risk-averse profile to decrease loss. TS with attention-deficit hyperactivity disorder comorbidity was associated with risk-seeking in the ambiguity context to reduce a potential loss. CONCLUSIONS: Impaired valuation of risk and ambiguity was not related to TS per se. Our findings are important for clinical practice: the involvement of individuals with TS in real-life risky situations may actually rather result from other factors such as psychiatric comorbidities.


Asunto(s)
Trastorno por Déficit de Atención con Hiperactividad , Trastorno Obsesivo Compulsivo , Tics , Síndrome de Tourette , Humanos , Adulto , Síndrome de Tourette/epidemiología , Síndrome de Tourette/psicología , Trastorno por Déficit de Atención con Hiperactividad/psicología , Tics/complicaciones , Tics/tratamiento farmacológico , Trastorno Obsesivo Compulsivo/psicología , Comorbilidad
3.
PLoS Biol ; 18(12): e3001028, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33290387

RESUMEN

While there is no doubt that social signals affect human reinforcement learning, there is still no consensus about how this process is computationally implemented. To address this issue, we compared three psychologically plausible hypotheses about the algorithmic implementation of imitation in reinforcement learning. The first hypothesis, decision biasing (DB), postulates that imitation consists in transiently biasing the learner's action selection without affecting their value function. According to the second hypothesis, model-based imitation (MB), the learner infers the demonstrator's value function through inverse reinforcement learning and uses it to bias action selection. Finally, according to the third hypothesis, value shaping (VS), the demonstrator's actions directly affect the learner's value function. We tested these three hypotheses in 2 experiments (N = 24 and N = 44) featuring a new variant of a social reinforcement learning task. We show through model comparison and model simulation that VS provides the best explanation of learner's behavior. Results replicated in a third independent experiment featuring a larger cohort and a different design (N = 302). In our experiments, we also manipulated the quality of the demonstrators' choices and found that learners were able to adapt their imitation rate, so that only skilled demonstrators were imitated. We proposed and tested an efficient meta-learning process to account for this effect, where imitation is regulated by the agreement between the learner and the demonstrator. In sum, our findings provide new insights and perspectives on the computational mechanisms underlying adaptive imitation in human reinforcement learning.


Asunto(s)
Conducta Imitativa/fisiología , Refuerzo Social , Aprendizaje Social/fisiología , Adulto , Femenino , Humanos , Aprendizaje/fisiología , Masculino , Modelos Teóricos , Refuerzo en Psicología , Recompensa , Adulto Joven
4.
Dev Sci ; 26(3): e13330, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-36194156

RESUMEN

Understanding how learning changes during human development has been one of the long-standing objectives of developmental science. Recently, advances in computational biology have demonstrated that humans display a bias when learning to navigate novel environments through rewards and punishments: they learn more from outcomes that confirm their expectations than from outcomes that disconfirm them. Here, we ask whether confirmatory learning is stable across development, or whether it might be attenuated in developmental stages in which exploration is beneficial, such as in adolescence. In a reinforcement learning (RL) task, 77 participants aged 11-32 years (four men, mean age = 16.26) attempted to maximize monetary rewards by repeatedly sampling different pairs of novel options, which varied in their reward/punishment probabilities. Mixed-effect models showed an age-related increase in accuracy as long as learning contingencies remained stable across trials, but less so when they reversed halfway through the trials. Age was also associated with a greater tendency to stay with an option that had just delivered a reward, more than to switch away from an option that had just delivered a punishment. At the computational level, a confirmation model provided increasingly better fit with age. This model showed that age differences are captured by decreases in noise or exploration, rather than in the magnitude of the confirmation bias. These findings provide new insights into how learning changes during development and could help better tailor learning environments to people of different ages. RESEARCH HIGHLIGHTS: Reinforcement learning shows age-related improvement during adolescence, but more in stable learning environments compared with volatile learning environments. People tend to stay with an option after a win more than they shift from an option after a loss, and this asymmetry increases with age during adolescence. Computationally, these changes are captured by a developing confirmatory learning style, in which people learn more from outcomes that confirm rather than disconfirm their choices. Age-related differences in confirmatory learning are explained by decreases in stochasticity, rather than changes in the magnitude of the confirmation bias.


Asunto(s)
Aprendizaje , Refuerzo en Psicología , Masculino , Humanos , Adolescente , Recompensa , Castigo
5.
J Neurosci ; 41(23): 5102-5114, 2021 06 09.
Artículo en Inglés | MEDLINE | ID: mdl-33926998

RESUMEN

Forrest Gump or The Matrix? Preference-based decisions are subjective and entail self-reflection. However, these self-related features are unaccounted for by known neural mechanisms of valuation and choice. Self-related processes have been linked to a basic interoceptive biological mechanism, the neural monitoring of heartbeats, in particular in ventromedial prefrontal cortex (vmPFC), a region also involved in value encoding. We thus hypothesized a functional coupling between the neural monitoring of heartbeats and the precision of value encoding in vmPFC. Human participants of both sexes were presented with pairs of movie titles. They indicated either which movie they preferred or performed a control objective visual discrimination that did not require self-reflection. Using magnetoencephalography, we measured heartbeat-evoked responses (HERs) before option presentation and confirmed that HERs in vmPFC were larger when preparing for the subjective, self-related task. We retrieved the expected cortical value network during choice with time-resolved statistical modeling. Crucially, we show that larger HERs before option presentation are followed by stronger value encoding during choice in vmPFC. This effect is independent of overall vmPFC baseline activity. The neural interaction between HERs and value encoding predicted preference-based choice consistency over time, accounting for both interindividual differences and trial-to-trial fluctuations within individuals. Neither cardiac activity nor arousal fluctuations could account for any of the effects. HERs did not interact with the encoding of perceptual evidence in the discrimination task. Our results show that the self-reflection underlying preference-based decisions involves HERs, and that HER integration to subjective value encoding in vmPFC contributes to preference stability.SIGNIFICANCE STATEMENT Deciding whether you prefer Forrest Gump or The Matrix is based on subjective values, which only you, the decision-maker, can estimate and compare, by asking yourself. Yet, how self-reflection is biologically implemented and its contribution to subjective valuation are not known. We show that in ventromedial prefrontal cortex, the neural response to heartbeats, an interoceptive self-related process, influences the cortical representation of subjective value. The neural interaction between the cortical monitoring of heartbeats and value encoding predicts choice consistency (i.e., whether you consistently prefer Forrest Gump over Matrix over time. Our results pave the way for the quantification of self-related processes in decision-making and may shed new light on the relationship between maladaptive decisions and impaired interoception.


Asunto(s)
Toma de Decisiones/fisiología , Frecuencia Cardíaca/fisiología , Interocepción/fisiología , Corteza Prefrontal/fisiología , Adulto , Femenino , Humanos , Magnetoencefalografía , Masculino
6.
J Neurosci ; 40(16): 3268-3277, 2020 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-32156831

RESUMEN

Adaptive coding of stimuli is well documented in perception, where it supports efficient encoding over a broad range of possible percepts. Recently, a similar neural mechanism has been reported also in value-based decision, where it allows optimal encoding of vast ranges of values in PFC: neuronal response to value depends on the choice context (relative coding), rather than being invariant across contexts (absolute coding). Additionally, value learning is sensitive to the amount of feedback information: providing complete feedback (both obtained and forgone outcomes) instead of partial feedback (only obtained outcome) improves learning. However, it is unclear whether relative coding occurs in all PFC regions and how it is affected by feedback information. We systematically investigated univariate and multivariate feedback encoding in various mPFC regions and compared three modes of neural coding: absolute, partially-adaptive and fully-adaptive.Twenty-eight human participants (both sexes) performed a learning task while undergoing fMRI scanning. On each trial, they chose between two symbols associated with a certain outcome. Then, the decision outcome was revealed. Notably, in one-half of the trials participants received partial feedback, whereas in the other half they got complete feedback. We used univariate and multivariate analysis to explore value encoding in different feedback conditions.We found that both obtained and forgone outcomes were encoded in mPFC, but with opposite sign in its ventral and dorsal subdivisions. Moreover, we showed that increasing feedback information induced a switch from absolute to relative coding. Our results suggest that complete feedback information enhances context-dependent outcome encoding.SIGNIFICANCE STATEMENT This study offers a systematic investigation of the effect of the amount of feedback information (partial vs complete) on univariate and multivariate outcome value encoding, within multiple regions in mPFC and cingulate cortex that are critical for value-based decisions and behavioral adaptation. Moreover, we provide the first comparison of three possible models of neural coding (i.e., absolute, partially-adaptive, and fully-adaptive coding) of value signal in these regions, by using commensurable measures of prediction accuracy. Taken together, our results help build a more comprehensive picture of how the human brain encodes and processes outcome value. In particular, our results suggest that simultaneous presentation of obtained and foregone outcomes promotes relative value representation.


Asunto(s)
Toma de Decisiones/fisiología , Retroalimentación Psicológica/fisiología , Giro del Cíngulo/diagnóstico por imagen , Aprendizaje/fisiología , Corteza Prefrontal/diagnóstico por imagen , Adulto , Femenino , Neuroimagen Funcional , Humanos , Imagen por Resonancia Magnética , Masculino , Pruebas Neuropsicológicas , Refuerzo en Psicología , Adulto Joven
7.
Proc Natl Acad Sci U S A ; 115(49): E11446-E11454, 2018 12 04.
Artículo en Inglés | MEDLINE | ID: mdl-30442672

RESUMEN

Money is a fundamental and ubiquitous institution in modern economies. However, the question of its emergence remains a central one for economists. The monetary search-theoretic approach studies the conditions under which commodity money emerges as a solution to override frictions inherent to interindividual exchanges in a decentralized economy. Although among these conditions, agents' rationality is classically essential and a prerequisite to any theoretical monetary equilibrium, human subjects often fail to adopt optimal strategies in tasks implementing a search-theoretic paradigm when these strategies are speculative, i.e., involve the use of a costly medium of exchange to increase the probability of subsequent and successful trades. In the present work, we hypothesize that implementing such speculative behaviors relies on reinforcement learning instead of lifetime utility calculations, as supposed by classical economic theory. To test this hypothesis, we operationalized the Kiyotaki and Wright paradigm of money emergence in a multistep exchange task and fitted behavioral data regarding human subjects performing this task with two reinforcement learning models. Each of them implements a distinct cognitive hypothesis regarding the weight of future or counterfactual rewards in current decisions. We found that both models outperformed theoretical predictions about subjects' behaviors regarding the implementation of speculative strategies and that the latter relies on the degree of the opportunity costs consideration in the learning process. Speculating about the marketability advantage of money thus seems to depend on mental simulations of counterfactual events that agents are performing in exchange situations.


Asunto(s)
Conducta de Elección , Aprendizaje , Modelos Psicológicos , Refuerzo en Psicología , Recompensa , Toma de Decisiones , Humanos
8.
Cogn Affect Behav Neurosci ; 20(6): 1184-1199, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-32875531

RESUMEN

In simple instrumental-learning tasks, humans learn to seek gains and to avoid losses equally well. Yet, two effects of valence are observed. First, decisions in loss-contexts are slower. Second, loss contexts decrease individuals' confidence in their choices. Whether these two effects are two manifestations of a single mechanism or whether they can be partially dissociated is unknown. Across six experiments, we attempted to disrupt the valence-induced motor bias effects by manipulating the mapping between decisions and actions and imposing constraints on response times (RTs). Our goal was to assess the presence of the valence-induced confidence bias in the absence of the RT bias. We observed both motor and confidence biases despite our disruption attempts, establishing that the effects of valence on motor and metacognitive responses are very robust and replicable. Nonetheless, within- and between-individual inferences reveal that the confidence bias resists the disruption of the RT bias. Therefore, although concomitant in most cases, valence-induced motor and confidence biases seem to be partly dissociable. These results highlight new important mechanistic constraints that should be incorporated in learning models to jointly explain choice, reaction times and confidence.


Asunto(s)
Aprendizaje , Refuerzo en Psicología , Sesgo , Humanos , Motivación , Tiempo de Reacción
9.
PLoS Comput Biol ; 15(7): e1007224, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-31356594

RESUMEN

Depression is characterized by a marked decrease in social interactions and blunted sensitivity to rewards. Surprisingly, despite the importance of social deficits in depression, non-social aspects have been disproportionally investigated. As a consequence, the cognitive mechanisms underlying atypical decision-making in social contexts in depression are poorly understood. In the present study, we investigate whether deficits in reward processing interact with the social context and how this interaction is affected by self-reported depression and anxiety symptoms in the general population. Two cohorts of subjects (discovery and replication sample: N = 50 each) took part in an experiment involving reward learning in contexts with different levels of social information (absent, partial and complete). Behavioral analyses revealed a specific detrimental effect of depressive symptoms-but not anxiety-on behavioral performance in the presence of social information, i.e. when participants were informed about the choices of another player. Model-based analyses further characterized the computational nature of this deficit as a negative audience effect, rather than a deficit in the way others' choices and rewards are integrated in decision making. To conclude, our results shed light on the cognitive and computational mechanisms underlying the interaction between social cognition, reward learning and decision-making in depressive disorders.


Asunto(s)
Depresión/psicología , Relaciones Interpersonales , Aprendizaje , Recompensa , Adulto , Ansiedad/psicología , Estudios de Cohortes , Biología Computacional , Simulación por Computador , Toma de Decisiones , Femenino , Humanos , Modelos Lineales , Masculino , Persona de Mediana Edad , Modelos Psicológicos , Refuerzo en Psicología , Adulto Joven
10.
PLoS Comput Biol ; 15(9): e1007326, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31490934

RESUMEN

Value-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Moreover, engaging control to proactively suppress irrelevant information that could conflict with task-relevant information would presumably also be cognitively costly. Yet, it remains unclear whether the cognitive control demands involved in preventing and resolving conflict also constitute costs in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their free choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of investing cognitive control to suppress an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that free choices were more biased when participants were less sure about which action was more rewarding. This supports the hypothesis that the costs linked to conflict management were traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one's actions and external distractors. Our results show that the subjective cognitive control costs linked to conflict factor into value-based decision-making, and highlight that different types of conflict may have different effects on learning about action outcomes.


Asunto(s)
Toma de Decisiones/fisiología , Aprendizaje/fisiología , Adulto , Biología Computacional , Simulación por Computador , Femenino , Humanos , Masculino , Modelos Psicológicos , Recompensa , Adulto Joven
11.
PLoS Comput Biol ; 15(4): e1006973, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30958826

RESUMEN

The ability to correctly estimate the probability of one's choices being correct is fundamental to optimally re-evaluate previous choices or to arbitrate between different decision strategies. Experimental evidence nonetheless suggests that this metacognitive process-confidence judgment- is susceptible to numerous biases. Here, we investigate the effect of outcome valence (gains or losses) on confidence while participants learned stimulus-outcome associations by trial-and-error. In two experiments, participants were more confident in their choices when learning to seek gains compared to avoiding losses, despite equal difficulty and performance between those two contexts. Computational modelling revealed that this bias is driven by the context-value, a dynamically updated estimate of the average expected-value of choice options, necessary to explain equal performance in the gain and loss domain. The biasing effect of context-value on confidence, revealed here for the first time in a reinforcement-learning context, is therefore domain-general, with likely important functional consequences. We show that one such consequence emerges in volatile environments, where the (in)flexibility of individuals' learning strategies differs when outcomes are framed as gains or losses. Despite apparent similar behavior- profound asymmetries might therefore exist between learning to avoid losses and learning to seek gains.


Asunto(s)
Conducta de Elección/ética , Toma de Decisiones/ética , Juicio/ética , Adulto , Conducta de Elección/fisiología , Toma de Decisiones/fisiología , Femenino , Humanos , Juicio/fisiología , Aprendizaje , Masculino , Refuerzo en Psicología , Autoimagen , Adulto Joven
12.
J Neurosci ; 38(48): 10338-10348, 2018 11 28.
Artículo en Inglés | MEDLINE | ID: mdl-30327418

RESUMEN

The extent to which subjective awareness influences reward processing, and thereby affects future decisions, is currently largely unknown. In the present report, we investigated this question in a reinforcement learning framework, combining perceptual masking, computational modeling, and electroencephalographic recordings (human male and female participants). Our results indicate that degrading the visibility of the reward decreased, without completely obliterating, the ability of participants to learn from outcomes, but concurrently increased their tendency to repeat previous choices. We dissociated electrophysiological signatures evoked by the reward-based learning processes from those elicited by the reward-independent repetition of previous choices and showed that these neural activities were significantly modulated by reward visibility. Overall, this report sheds new light on the neural computations underlying reward-based learning and decision-making and highlights that awareness is beneficial for the trial-by-trial adjustment of decision-making strategies.SIGNIFICANCE STATEMENT The notion of reward is strongly associated with subjective evaluation, related to conscious processes such as "pleasure," "liking," and "wanting." Here we show that degrading reward visibility in a reinforcement learning task decreases, without completely obliterating, the ability of participants to learn from outcomes, but concurrently increases subjects' tendency to repeat previous choices. Electrophysiological recordings, in combination with computational modeling, show that neural activities were significantly modulated by reward visibility. Overall, we dissociate different neural computations underlying reward-based learning and decision-making, which highlights a beneficial role of reward awareness in adjusting decision-making strategies.


Asunto(s)
Concienciación/fisiología , Conducta de Elección/fisiología , Simulación por Computador , Aprendizaje/fisiología , Refuerzo en Psicología , Recompensa , Adulto , Estudios de Cohortes , Electroencefalografía/métodos , Fenómenos Electrofisiológicos/fisiología , Femenino , Humanos , Masculino , Estimulación Luminosa/métodos , Adulto Joven
13.
Cogn Affect Behav Neurosci ; 19(3): 490-502, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31175616

RESUMEN

Reinforcement learning (RL) models describe how humans and animals learn by trial-and-error to select actions that maximize rewards and minimize punishments. Traditional RL models focus exclusively on choices, thereby ignoring the interactions between choice preference and response time (RT), or how these interactions are influenced by contextual factors. However, in the field of perceptual decision-making, such interactions have proven to be important to dissociate between different underlying cognitive processes. Here, we investigated such interactions to shed new light on overlooked differences between learning to seek rewards and learning to avoid losses. We leveraged behavioral data from four RL experiments, which feature manipulations of two factors: outcome valence (gains vs. losses) and feedback information (partial vs. complete feedback). A Bayesian meta-analysis revealed that these contextual factors differently affect RTs and accuracy: While valence only affects RTs, feedback information affects both RTs and accuracy. To dissociate between the latent cognitive processes, we jointly fitted choices and RTs across all experiments with a Bayesian, hierarchical diffusion decision model (DDM). We found that the feedback manipulation affected drift rate, threshold, and non-decision time, suggesting that it was not a mere difficulty effect. Moreover, valence affected non-decision time and threshold, suggesting a motor inhibition in punishing contexts. To better understand the learning dynamics, we finally fitted a combination of RL and DDM (RLDDM). We found that while the threshold was modulated by trial-specific decision conflict, the non-decision time was modulated by the learned context valence. Overall, our results illustrate the benefits of jointly modeling RTs and choice data during RL, to reveal subtle mechanistic differences underlying decisions in different learning contexts.


Asunto(s)
Toma de Decisiones/fisiología , Retroalimentación Psicológica/fisiología , Modelos Biológicos , Tiempo de Reacción/fisiología , Refuerzo en Psicología , Adulto , Femenino , Humanos , Masculino , Metaanálisis como Asunto , Adulto Joven
14.
PLoS Comput Biol ; 13(8): e1005684, 2017 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-28800597

RESUMEN

Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.


Asunto(s)
Toma de Decisiones/fisiología , Retroalimentación Psicológica/fisiología , Aprendizaje/fisiología , Refuerzo en Psicología , Adulto , Biología Computacional , Femenino , Humanos , Masculino , Análisis y Desempeño de Tareas , Adulto Joven
15.
PLoS Comput Biol ; 12(6): e1004953, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-27322574

RESUMEN

Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.


Asunto(s)
Conducta del Adolescente/fisiología , Envejecimiento/fisiología , Toma de Decisiones/fisiología , Modelos Educacionales , Recompensa , Adolescente , Adulto , Algoritmos , Niño , Simulación por Computador , Técnicas de Apoyo para la Decisión , Femenino , Humanos , Masculino , Adulto Joven
16.
Brain ; 139(Pt 2): 605-15, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26490329

RESUMEN

Tics are sometimes described as voluntary movements performed in an automatic or habitual way. Here, we addressed the question of balance between goal-directed and habitual behavioural control in Gilles de la Tourette syndrome and formally tested the hypothesis of enhanced habit formation in these patients. To this aim, we administered a three-stage instrumental learning paradigm to 17 unmedicated and 17 antipsychotic-medicated patients with Gilles de la Tourette syndrome and matched controls. In the first stage of the task, participants learned stimulus-response-outcome associations. The subsequent outcome devaluation and 'slip-of-action' tests allowed evaluation of the participants' capacity to flexibly adjust their behaviour to changes in action outcome value. In this task, unmedicated patients relied predominantly on habitual, outcome-insensitive behavioural control. Moreover, in these patients, the engagement in habitual responses correlated with more severe tics. Medicated patients performed at an intermediate level between unmedicated patients and controls. Using diffusion tensor imaging on a subset of patients, we also addressed whether the engagement in habitual responding was related to structural connectivity within cortico-striatal networks. We showed that engagement in habitual behaviour in patients with Gilles de la Tourette syndrome correlated with greater structural connectivity within the right motor cortico-striatal network. In unmedicated patients, stronger structural connectivity of the supplementary motor cortex with the sensorimotor putamen predicted more severe tics. Overall, our results indicate enhanced habit formation in unmedicated patients with Gilles de la Tourette syndrome. Aberrant reinforcement signals to the sensorimotor striatum may be fundamental for the formation of stimulus-response associations and may contribute to the habitual behaviour and tics of this syndrome.


Asunto(s)
Encéfalo/fisiología , Hábitos , Red Nerviosa/fisiología , Desempeño Psicomotor/fisiología , Síndrome de Tourette/diagnóstico , Síndrome de Tourette/psicología , Adulto , Femenino , Humanos , Masculino
17.
J Neurosci ; 34(47): 15621-30, 2014 Nov 19.
Artículo en Inglés | MEDLINE | ID: mdl-25411490

RESUMEN

The mechanisms of reward maximization have been extensively studied at both the computational and neural levels. By contrast, little is known about how the brain learns to choose the options that minimize action cost. In principle, the brain could have evolved a general mechanism that applies the same learning rule to the different dimensions of choice options. To test this hypothesis, we scanned healthy human volunteers while they performed a probabilistic instrumental learning task that varied in both the physical effort and the monetary outcome associated with choice options. Behavioral data showed that the same computational rule, using prediction errors to update expectations, could account for both reward maximization and effort minimization. However, these learning-related variables were encoded in partially dissociable brain areas. In line with previous findings, the ventromedial prefrontal cortex was found to positively represent expected and actual rewards, regardless of effort. A separate network, encompassing the anterior insula, the dorsal anterior cingulate, and the posterior parietal cortex, correlated positively with expected and actual efforts. These findings suggest that the same computational rule is applied by distinct brain systems, depending on the choice dimension-cost or benefit-that has to be learned.


Asunto(s)
Aprendizaje/fisiología , Modelos Neurológicos , Esfuerzo Físico/fisiología , Recompensa , Adolescente , Adulto , Conducta de Elección , Simulación por Computador , Señales (Psicología) , Femenino , Humanos , Imagen por Resonancia Magnética , Masculino , Adulto Joven
18.
Commun Psychol ; 2(1): 51, 2024 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-39242743

RESUMEN

In the present study, we investigate and compare reasoning in large language models (LLMs) and humans, using a selection of cognitive psychology tools traditionally dedicated to the study of (bounded) rationality. We presented to human participants and an array of pretrained LLMs new variants of classical cognitive experiments, and cross-compared their performances. Our results showed that most of the included models presented reasoning errors akin to those frequently ascribed to error-prone, heuristic-based human reasoning. Notwithstanding this superficial similarity, an in-depth comparison between humans and LLMs indicated important differences with human-like reasoning, with models' limitations disappearing almost entirely in more recent LLMs' releases. Moreover, we show that while it is possible to devise strategies to induce better performance, humans and machines are not equally responsive to the same prompting schemes. We conclude by discussing the epistemological implications and challenges of comparing human and machine behavior for both artificial intelligence and cognitive psychology.

19.
PLoS One ; 19(4): e0301141, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38557590

RESUMEN

Recent advances in the field of machine learning have yielded novel research perspectives in behavioural economics and financial markets microstructure studies. In this paper we study the impact of individual trader leaning characteristics on markets using a stock market simulator designed with a multi-agent architecture. Each agent, representing an autonomous investor, trades stocks through reinforcement learning, using a centralized double-auction limit order book. This approach allows us to study the impact of individual trader traits on the whole stock market at the mesoscale in a bottom-up approach. We chose to test three trader trait aspects: agent learning rate increases, herding behaviour and random trading. As hypothesized, we find that larger learning rates significantly increase the number of crashes. We also find that herding behaviour undermines market stability, while random trading tends to preserve it.


Asunto(s)
Inversiones en Salud , Modelos Económicos , Aprendizaje Automático , Fenotipo
20.
Biol Psychiatry ; 95(10): 974-984, 2024 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-38101503

RESUMEN

BACKGROUND: Drugs like opioids are potent reinforcers thought to co-opt value-based decisions by overshadowing other rewarding outcomes, but how this happens at a neurocomputational level remains elusive. Range adaptation is a canonical process of fine-tuning representations of value based on reward context. Here, we tested whether recent opioid exposure impacts range adaptation in opioid use disorder, potentially explaining why shifting decision making away from drug taking during this vulnerable period is so difficult. METHODS: Participants who had recently (<90 days) used opioids (n = 34) or who had abstained from opioid use for ≥ 90 days (n = 20) and comparison control participants (n = 44) completed a reinforcement learning task designed to induce robust contextual modulation of value. Two models were used to assess the latent process that participants engaged while making their decisions: 1) a Range model that dynamically tracks context and 2) a standard Absolute model that assumes stationary, objective encoding of value. RESULTS: Control participants and ≥90-days-abstinent participants with opioid use disorder exhibited choice patterns consistent with range-adapted valuation. In contrast, participants with recent opioid use were more prone to learn and encode value on an absolute scale. Computational modeling confirmed the behavior of most control participants and ≥90-days-abstinent participants with opioid use disorder (75%), but a minority in the recent use group (38%), was better fit by the Range model than the Absolute model. Furthermore, the degree to which participants relied on range adaptation correlated with duration of continuous abstinence and subjective craving/withdrawal. CONCLUSIONS: Reduced context adaptation to available rewards could explain difficulty deciding about smaller (typically nondrug) rewards in the aftermath of drug exposure.


Asunto(s)
Trastornos Relacionados con Opioides , Refuerzo en Psicología , Humanos , Masculino , Adulto , Femenino , Recompensa , Adulto Joven , Toma de Decisiones/efectos de los fármacos , Toma de Decisiones/fisiología , Analgésicos Opioides/administración & dosificación , Analgésicos Opioides/farmacología , Conducta de Elección/efectos de los fármacos , Conducta de Elección/fisiología , Adaptación Psicológica/efectos de los fármacos , Adaptación Psicológica/fisiología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA