Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 117
Filtrar
1.
Cell ; 177(4): 986-998.e15, 2019 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-30982599

RESUMO

By observing their social partners, primates learn about reward values of objects. Here, we show that monkeys' amygdala neurons derive object values from observation and use these values to simulate a partner monkey's decision process. While monkeys alternated making reward-based choices, amygdala neurons encoded object-specific values learned from observation. Dynamic activities converted these values to representations of the recorded monkey's own choices. Surprisingly, the same activity patterns unfolded spontaneously before partner's choices in separate neurons, as if these neurons simulated the partner's decision-making. These "simulation neurons" encoded signatures of mutual-inhibitory decision computation, including value comparisons and value-to-choice conversions, resulting in accurate predictions of partner's choices. Population decoding identified differential contributions of amygdala subnuclei. Biophysical modeling of amygdala circuits showed that simulation neurons emerge naturally from convergence between object-value neurons and self-other neurons. By simulating decision computations during observation, these neurons could allow primates to reconstruct their social partners' mental states.


Assuntos
Tonsila do Cerebelo/metabolismo , Tonsila do Cerebelo/fisiologia , Tomada de Decisões/fisiologia , Animais , Comportamento Animal/fisiologia , Comportamento de Escolha/fisiologia , Relações Interpessoais , Aprendizagem/fisiologia , Macaca mulatta/fisiologia , Masculino , Neurônios/metabolismo , Neurônios/fisiologia , Recompensa
2.
Cell ; 166(6): 1564-1571.e6, 2016 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-27610576

RESUMO

Optogenetic studies in mice have revealed new relationships between well-defined neurons and brain functions. However, there are currently no means to achieve the same cell-type specificity in monkeys, which possess an expanded behavioral repertoire and closer anatomical homology to humans. Here, we present a resource for cell-type-specific channelrhodopsin expression in Rhesus monkeys and apply this technique to modulate dopamine activity and monkey choice behavior. These data show that two viral vectors label dopamine neurons with greater than 95% specificity. Infected neurons were activated by light pulses, indicating functional expression. The addition of optical stimulation to reward outcomes promoted the learning of reward-predicting stimuli at the neuronal and behavioral level. Together, these results demonstrate the feasibility of effective and selective stimulation of dopamine neurons in non-human primates and a resource that could be applied to other cell types in the monkey brain.


Assuntos
Comportamento de Escolha/fisiologia , Neurônios Dopaminérgicos/metabolismo , Optogenética/métodos , Animais , Dependovirus/genética , Dopamina/metabolismo , Regulação da Expressão Gênica , Vetores Genéticos/genética , Macaca mulatta , Regiões Promotoras Genéticas/genética , Rodopsina/genética
3.
Proc Natl Acad Sci U S A ; 121(20): e2316658121, 2024 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-38717856

RESUMO

Individual survival and evolutionary selection require biological organisms to maximize reward. Economic choice theories define the necessary and sufficient conditions, and neuronal signals of decision variables provide mechanistic explanations. Reinforcement learning (RL) formalisms use predictions, actions, and policies to maximize reward. Midbrain dopamine neurons code reward prediction errors (RPE) of subjective reward value suitable for RL. Electrical and optogenetic self-stimulation experiments demonstrate that monkeys and rodents repeat behaviors that result in dopamine excitation. Dopamine excitations reflect positive RPEs that increase reward predictions via RL; against increasing predictions, obtaining similar dopamine RPE signals again requires better rewards than before. The positive RPEs drive predictions higher again and thus advance a recursive reward-RPE-prediction iteration toward better and better rewards. Agents also avoid dopamine inhibitions that lower reward prediction via RL, which allows smaller rewards than before to elicit positive dopamine RPE signals and resume the iteration toward better rewards. In this way, dopamine RPE signals serve a causal mechanism that attracts agents via RL to the best rewards. The mechanism improves daily life and benefits evolutionary selection but may also induce restlessness and greed.


Assuntos
Dopamina , Neurônios Dopaminérgicos , Recompensa , Animais , Dopamina/metabolismo , Neurônios Dopaminérgicos/fisiologia , Neurônios Dopaminérgicos/metabolismo , Humanos , Reforço Psicológico
4.
J Neurosci ; 43(40): 6796-6806, 2023 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-37625854

RESUMO

All life must solve how to allocate limited energy resources to maximize benefits from scarce opportunities. Economic theory posits decision makers optimize choice by maximizing the subjective benefit (utility) of reward minus the subjective cost (disutility) of the required effort. While successful in many settings, this model does not fully account for how experience can alter reward-effort trade-offs. Here, we test how well the subtractive model of effort disutility explains the behavior of two male nonhuman primates (Macaca mulatta) in a binary choice task in which reward quantity and physical effort to obtain were varied. Applying random utility modeling to independently estimate reward utility and effort disutility, we show the subtractive effort model better explains out-of-sample choice behavior when compared with parabolic and exponential effort discounting. Furthermore, we demonstrate that effort disutility depends on previous experience of effort: in analogy to work from behavioral labor economics, we develop a model of reference-dependent effort disutility to explain the increased willingness to expend effort following previous experience of effortful options in a session. The result of this analysis suggests that monkeys discount reward by an effort cost that is measured relative to an expected effort learned from previous trials. When this subjective cost of effort, a function of context and experience, is accounted for, trial-by-trial choices can be explained by the subtractive cost model of effort. Therefore, in searching for net utility signals that may underpin effort-based decision-making in the brain, careful measurement of subjective effort costs is an essential first step.SIGNIFICANCE STATEMENT All decision-makers need to consider how much effort they need to expend when evaluating potential options. Economic theories suggest that the optimal way to choose is by cost-benefit analysis of reward against effort. To be able to do this efficiently over many decision contexts, this needs to be done flexibly, with appropriate adaptation to context and experience. Therefore, in aiming to understand how this might be achieved in the brain, it is important to first carefully measure the subjective cost of effort. Here, we show monkeys make reward-effort cost-benefit decisions, subtracting the subjective cost of effort from the subjective value of rewards. Moreover, the subjective cost of effort is dependent on the monkeys' experience of effort in previous trials.


Assuntos
Comportamento de Escolha , Tomada de Decisões , Animais , Masculino , Encéfalo , Aprendizagem , Recompensa
5.
Proc Natl Acad Sci U S A ; 118(30)2021 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-34285071

RESUMO

Sensitivity to satiety constitutes a basic requirement for neuronal coding of subjective reward value. Satiety from natural ongoing consumption affects reward functions in learning and approach behavior. More specifically, satiety reduces the subjective economic value of individual rewards during choice between options that typically contain multiple reward components. The unconfounded assessment of economic reward value requires tests at choice indifference between two options, which is difficult to achieve with sated rewards. By conceptualizing choices between options with multiple reward components ("bundles"), Revealed Preference Theory may offer a solution. Despite satiety, choices against an unaltered reference bundle may remain indifferent when the reduced value of a sated bundle reward is compensated by larger amounts of an unsated reward of the same bundle, and then the value loss of the sated reward is indicated by the amount of the added unsated reward. Here, we show psychophysically titrated choice indifference in monkeys between bundles of differently sated rewards. Neuronal chosen value signals in the orbitofrontal cortex (OFC) followed closely the subjective value change within recording periods of individual neurons. A neuronal classifier distinguishing the bundles and predicting choice substantiated the subjective value change. The choice between conventional single rewards confirmed the neuronal changes seen with two-reward bundles. Thus, reward-specific satiety reduces subjective reward value signals in OFC. With satiety being an important factor of subjective reward value, these results extend the notion of subjective economic reward value coding in OFC neurons.


Assuntos
Adaptação Fisiológica , Comportamento de Escolha , Vias Neurais , Neurônios/fisiologia , Córtex Pré-Frontal/fisiologia , Recompensa , Resposta de Saciedade/fisiologia , Animais , Aprendizagem , Macaca mulatta , Masculino
6.
J Neurosci ; 42(8): 1510-1528, 2022 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-34937703

RESUMO

Economic choice is thought to involve the elicitation of the subjective values of the choice options. Thus far, value estimation in animals has relied on stochastic choices between multiple options presented in repeated trials and expressed from averages of dozens of trials. However, subjective reward valuations are made moment-to-moment and do not always require alternative options; their consequences are usually felt immediately. Here, we describe a Becker-DeGroot-Marschak (BDM) auction-like mechanism that provides more direct and simple valuations with immediate consequences. The BDM encourages agents to truthfully reveal their true subjective value in individual choices ("incentive compatibility"). Male monkeys reliably placed well-ranked BDM bids for up to five juice volumes while paying from a water budget. The bids closely approximated the average subjective values estimated with conventional binary choices (BCs), thus demonstrating procedural invariance and aligning with the wealth of knowledge acquired with these less direct estimation methods. The feasibility of BDM bidding in monkeys paves the way for an analysis of subjective neuronal value signals in single trials rather than from averages; the feasibility also bridges the gap to the increasingly used BDM method in human neuroeconomics.SIGNIFICANCE STATEMENT The subjective economic value of rewards cannot be measured directly but must be inferred from observable behavior. Until now, the estimation method in animals was rather complex and required comparison between several choice options during repeated choices; thus, such methods did not respect the imminence of the outcome from individual choices. However, human economic research has developed a simple auction-like procedure that can reveal in a direct and immediate manner the true subjective value in individual choices [Becker-DeGroot-Marschak (BDM) mechanism]. The current study implemented this mechanism in rhesus monkeys and demonstrates its usefulness for eliciting meaningful value estimates of liquid rewards. The mechanism allows future neurophysiological assessment of subjective reward value signals in single trials of controlled animal tasks.


Assuntos
Comportamento de Escolha , Recompensa , Animais , Comportamento de Escolha/fisiologia , Macaca mulatta , Masculino , Neurônios/fisiologia
7.
Cogn Affect Behav Neurosci ; 23(3): 600-619, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36823249

RESUMO

Despite being unpredictable and uncertain, reward environments often exhibit certain regularities, and animals navigating these environments try to detect and utilize such regularities to adapt their behavior. However, successful learning requires that animals also adjust to uncertainty associated with those regularities. Here, we analyzed choice data from two comparable dynamic foraging tasks in mice and monkeys to investigate mechanisms underlying adjustments to different types of uncertainty. In these tasks, animals selected between two choice options that delivered reward probabilistically, while baseline reward probabilities changed after a variable number (block) of trials without any cues to the animals. To measure adjustments in behavior, we applied multiple metrics based on information theory that quantify consistency in behavior, and fit choice data using reinforcement learning models. We found that in both species, learning and choice were affected by uncertainty about reward outcomes (in terms of determining the better option) and by expectation about when the environment may change. However, these effects were mediated through different mechanisms. First, more uncertainty about the better option resulted in slower learning and forgetting in mice, whereas it had no significant effect in monkeys. Second, expectation of block switches accompanied slower learning, faster forgetting, and increased stochasticity in choice in mice, whereas it only reduced learning rates in monkeys. Overall, while demonstrating the usefulness of metrics based on information theory in examining adaptive behavior, our study provides evidence for multiple types of adjustments in learning and choice behavior according to uncertainty in the reward environment.


Assuntos
Comportamento de Escolha , Recompensa , Camundongos , Animais , Incerteza , Haplorrinos , Aprendizagem , Tomada de Decisões
8.
J Neurosci ; 41(13): 2964-2979, 2021 03 31.
Artigo em Inglês | MEDLINE | ID: mdl-33542082

RESUMO

Expected Utility Theory (EUT), the first axiomatic theory of risky choice, describes choices as a utility maximization process: decision makers assign a subjective value (utility) to each choice option and choose the one with the highest utility. The continuity axiom, central to Expected Utility Theory and its modifications, is a necessary and sufficient condition for the definition of numerical utilities. The axiom requires decision makers to be indifferent between a gamble and a specific probabilistic combination of a more preferred and a less preferred gamble. While previous studies demonstrated that monkeys choose according to combinations of objective reward magnitude and probability, a concept-driven experimental approach for assessing the axiomatically defined conditions for maximizing utility by animals is missing. We experimentally tested the continuity axiom for a broad class of gamble types in 4 male rhesus macaque monkeys, showing that their choice behavior complied with the existence of a numerical utility measure as defined by the economic theory. We used the numerical quantity specified in the continuity axiom to characterize subjective preferences in a magnitude-probability space. This mapping highlighted a trade-off relation between reward magnitudes and probabilities, compatible with the existence of a utility function underlying subjective value computation. These results support the existence of a numerical utility function able to describe choices, allowing for the investigation of the neuronal substrates responsible for coding such rigorously defined quantity.SIGNIFICANCE STATEMENT A common assumption of several economic choice theories is that decisions result from the comparison of subjectively assigned values (utilities). This study demonstrated the compliance of monkey behavior with the continuity axiom of Expected Utility Theory, implying a subjective magnitude-probability trade-off relation, which supports the existence of numerical utility directly linked to the theoretical economic framework. We determined a numerical utility measure able to describe choices, which can serve as a correlate for the neuronal activity in the quest for brain structures and mechanisms guiding decisions.


Assuntos
Comportamento de Escolha/fisiologia , Desempenho Psicomotor/fisiologia , Recompensa , Animais , Macaca mulatta , Masculino , Estimulação Luminosa/métodos , Primatas
9.
J Neurosci ; 41(13): 3000-3013, 2021 03 31.
Artigo em Inglês | MEDLINE | ID: mdl-33568490

RESUMO

Rewarding choice options typically contain multiple components, but neural signals in single brain voxels are scalar and primarily vary up or down. In a previous study, we had designed reward bundles that contained the same two milkshakes with independently set amounts; we had used psychophysics and rigorous economic concepts to estimate two-dimensional choice indifference curves (ICs) that represented revealed stochastic preferences for these bundles in a systematic, integrated manner. All bundles on the same ICs were equally revealed preferred (and thus had same utility, as inferred from choice indifference); bundles on higher ICs (higher utility) were preferred to bundles on lower ICs (lower utility). In the current study, we used the established behavior for testing with functional magnetic resonance imaging (fMRI). We now demonstrate neural responses in reward-related brain structures of human female and male participants, including striatum, midbrain, and medial orbitofrontal cortex (mid-OFC) that followed the characteristic pattern of ICs: similar responses along ICs (same utility despite different bundle composition), but monotonic change across ICs (different utility). Thus, these brain structures integrated multiple reward components into a scalar signal, well beyond the known subjective value coding of single-component rewards.SIGNIFICANCE STATEMENT Rewards have several components, like the taste and size of an apple, but it is unclear how each component contributes to the overall value of the reward. While choice indifference curves (ICs) of economic theory provide behavioral approaches to this question, it is unclear whether brain responses capture the preference and utility integrated from multiple components. We report activations in striatum, midbrain, and orbitofrontal cortex (OFC) that follow choice ICs representing behavioral preferences over and above variations of individual reward components. In addition, the concept-driven approach encourages future studies on natural, multicomponent rewards that are prone to irrational choice of normal and brain-damaged individuals.


Assuntos
Encéfalo/diagnóstico por imagem , Encéfalo/fisiologia , Comportamento de Escolha/fisiologia , Economia Comportamental , Recompensa , Adulto , Feminino , Humanos , Imageamento por Ressonância Magnética/métodos , Masculino , Estimulação Luminosa/métodos , Adulto Jovem
10.
Physiol Rev ; 95(3): 853-951, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26109341

RESUMO

Rewards are crucial objects that induce learning, approach behavior, choices, and emotions. Whereas emotions are difficult to investigate in animals, the learning function is mediated by neuronal reward prediction error signals which implement basic constructs of reinforcement learning theory. These signals are found in dopamine neurons, which emit a global reward signal to striatum and frontal cortex, and in specific neurons in striatum, amygdala, and frontal cortex projecting to select neuronal populations. The approach and choice functions involve subjective value, which is objectively assessed by behavioral choices eliciting internal, subjective reward preferences. Utility is the formal mathematical characterization of subjective value and a prime decision variable in economic choice theory. It is coded as utility prediction error by phasic dopamine responses. Utility can incorporate various influences, including risk, delay, effort, and social interaction. Appropriate for formal decision mechanisms, rewards are coded as object value, action value, difference value, and chosen value by specific neurons. Although all reward, reinforcement, and decision variables are theoretical constructs, their neuronal signals constitute measurable physical implementations and as such confirm the validity of these concepts. The neuronal reward signals provide guidance for behavior while constraining the free will to act.


Assuntos
Encéfalo/fisiologia , Comportamento de Escolha , Aprendizagem , Modelos Neurológicos , Recompensa , Animais , Neurônios Dopaminérgicos/fisiologia , Emoções , Humanos , Vias Neurais/fisiologia , Transmissão Sináptica
11.
Anim Cogn ; 25(2): 385-399, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34568979

RESUMO

Decisions can be risky or riskless, depending on the outcomes of the choice. Expected utility theory describes risky choices as a utility maximization process: we choose the option with the highest subjective value (utility), which we compute considering both the option's value and its associated risk. According to the random utility maximization framework, riskless choices could also be based on a utility measure. Neuronal mechanisms of utility-based choice may thus be common to both risky and riskless choices. This assumption would require the existence of a utility function that accounts for both risky and riskless decisions. Here, we investigated whether the choice behavior of two macaque monkeys in risky and riskless decisions could be described by a common underlying utility function. We found that the utility functions elicited in the two choice scenarios were different from each other, even after taking into account the contribution of subjective probability weighting. Our results suggest that distinct utility representations exist for risky and riskless choices, which could reflect distinct neuronal representations of the utility quantities, or distinct brain mechanisms for risky and riskless choices. The different utility functions should be taken into account in neuronal investigations of utility-based choice.


Assuntos
Comportamento de Escolha , Assunção de Riscos , Animais , Encéfalo , Comportamento de Escolha/fisiologia , Tomada de Decisões , Macaca mulatta , Probabilidade
12.
J Neurosci ; 40(46): 8938-8950, 2020 11 11.
Artigo em Inglês | MEDLINE | ID: mdl-33077553

RESUMO

Our ability to evaluate an experience retrospectively is important because it allows us to summarize its total value, and this summary value can then later be used as a guide in deciding whether the experience merits repeating, or whether instead it should rather be avoided. However, when an experience unfolds over time, humans tend to assign disproportionate weight to the later part of the experience, and this can lead to poor choice in repeating, or avoiding experience. Using model-based computational analyses of fMRI recordings in 27 male volunteers, we show that the human brain encodes the summary value of an extended sequence of outcomes in two distinct reward representations. We find that the overall experienced value is encoded accurately in the amygdala, but its merit is excessively marked down by disincentive anterior insula activity if the sequence of experienced outcomes declines temporarily. Moreover, the statistical strength of this neural code can separate efficient decision-makers from suboptimal decision-makers. Optimal decision-makers encode overall value more strongly, and suboptimal decision-makers encode the disincentive markdown (DM) more strongly. The separate neural implementation of the two distinct reward representations confirms that suboptimal choice for temporally extended outcomes can be the result of robust neural representation of a displeasing aspect of the experience such as temporary decline.SIGNIFICANCE STATEMENT One of the numerous foibles that prompt us to make poor decisions is known as the "Banker's fallacy," the tendency to focus on short-term growth at the expense of long-term value. This effect leads to unwarranted preference for happy endings. Here, we show that the anterior insula in the human brain marks down the overall value of an experience as it unfolds over time if the experience entails a sequence of predominantly negative temporal contrasts. By contrast, the amygdala encodes overall value accurately. These results provide neural indices for the dichotomy of decision utility and experienced utility popularized as Thinking fast and slow by Daniel Kahneman.


Assuntos
Tonsila do Cerebelo/fisiologia , Córtex Cerebral/fisiologia , Adulto , Tonsila do Cerebelo/diagnóstico por imagem , Mapeamento Encefálico , Córtex Cerebral/diagnóstico por imagem , Condicionamento Operante , Tomada de Decisões , Humanos , Imageamento por Ressonância Magnética , Masculino , Estimulação Luminosa , Desempenho Psicomotor/fisiologia , Esquema de Reforço , Recompensa , Adulto Jovem
13.
Nat Rev Neurosci ; 17(3): 183-95, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26865020

RESUMO

Environmental stimuli and objects, including rewards, are often processed sequentially in the brain. Recent work suggests that the phasic dopamine reward prediction-error response follows a similar sequential pattern. An initial brief, unselective and highly sensitive increase in activity unspecifically detects a wide range of environmental stimuli, then quickly evolves into the main response component, which reflects subjective reward value and utility. This temporal evolution allows the dopamine reward prediction-error signal to optimally combine speed and accuracy.


Assuntos
Dopamina/metabolismo , Neurônios/fisiologia , Recompensa , Transdução de Sinais/fisiologia , Animais , Humanos
14.
J Neurosci ; 39(15): 2915-2929, 2019 04 10.
Artigo em Inglês | MEDLINE | ID: mdl-30705103

RESUMO

Humans and other primates share many decision biases, among them our subjective distortion of objective probabilities. When making choices between uncertain rewards we typically treat probabilities nonlinearly: overvaluing low probabilities of reward and undervaluing high ones. A growing body of evidence, however, points to a more flexible pattern of distortion than the classical inverse-S one, highlighting the effect of experimental conditions in shifting the weight assigned to probabilities, such as task feedback, learning, and attention. Here we investigated the role of sequence structure (the order in which gambles are presented in a choice task) in shaping the probability distortion patterns of rhesus macaques: we presented 2 male monkeys with binary choice sequences of MIXED or REPEATED gambles against safe rewards. Parametric modeling revealed that choices in each sequence type were guided by significantly different patterns of probability distortion: whereas we elicited the classical inverse-S-shaped probability distortion in pseudorandomly MIXED trial sequences of gamble-safe choices, we found the opposite pattern consisting of S-shaped distortion, with REPEATED sequences. We extended these results to binary choices between two gambles, without a safe option, and confirmed the unique influence of the sequence structure in which the animals make choices. Finally, we showed that the value of gambles experienced in the past had a significant impact on the subjective value of future ones, shaping probability distortion on a trial-by-trial basis. Together, our results suggest that differences in choice sequence are sufficient to reverse the direction of probability distortion.SIGNIFICANCE STATEMENT Our lives are peppered with uncertain, probabilistic choices. Recent studies showed how such probabilities are subjectively distorted. In the present study, we show that probability distortions in macaque monkeys differ significantly between sequences in which single gambles are repeated (S-shaped distortion), as opposed to being pseudorandomly intermixed with other gambles (inverse-S-shaped distortion). Our findings challenge the idea of fixed probability distortions resulting from inflexible computations, and points to a more instantaneous evaluation of probabilistic information. Past trial outcomes appeared to drive the "gap" between probability distortions in different conditions. Our data suggest that, as in most adaptive systems, probability values are slowly but constantly updated from prior experience, driving measures of probability distortion to either side of the S/inverse-S debate.


Assuntos
Comportamento de Escolha/fisiologia , Probabilidade , Animais , Tomada de Decisões , Jogo de Azar/psicologia , Macaca mulatta , Masculino , Estimulação Luminosa , Esquema de Reforço , Recompensa , Assunção de Riscos
15.
Proc Natl Acad Sci U S A ; 114(10): E1766-E1775, 2017 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-28202727

RESUMO

Revealed preference theory provides axiomatic tools for assessing whether individuals make observable choices "as if" they are maximizing an underlying utility function. The theory evokes a tradeoff between goods whereby individuals improve themselves by trading one good for another good to obtain the best combination. Preferences revealed in these choices are modeled as curves of equal choice (indifference curves) and reflect an underlying process of optimization. These notions have far-reaching applications in consumer choice theory and impact the welfare of human and animal populations. However, they lack the empirical implementation in animals that would be required to establish a common biological basis. In a design using basic features of revealed preference theory, we measured in rhesus monkeys the frequency of repeated choices between bundles of two liquids. For various liquids, the animals' choices were compatible with the notion of giving up a quantity of one good to gain one unit of another good while maintaining choice indifference, thereby implementing the concept of marginal rate of substitution. The indifference maps consisted of nonoverlapping, linear, convex, and occasionally concave curves with typically negative, but also sometimes positive, slopes depending on bundle composition. Out-of-sample predictions using homothetic polynomials validated the indifference curves. The animals' preferences were internally consistent in satisfying transitivity. Change of option set size demonstrated choice optimality and satisfied the Weak Axiom of Revealed Preference (WARP). These data are consistent with a version of revealed preference theory in which preferences are stochastic; the monkeys behaved "as if" they had well-structured preferences and maximized utility.


Assuntos
Comportamento de Escolha , Tomada de Decisões , Macaca mulatta/psicologia , Animais , Computadores , Humanos , Macaca mulatta/fisiologia , Recompensa
16.
Proc Natl Acad Sci U S A ; 113(30): 8402-7, 2016 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-27402743

RESUMO

Utility is the fundamental variable thought to underlie economic choices. In particular, utility functions are believed to reflect preferences toward risk, a key decision variable in many real-life situations. To assess the validity of utility representations, it is therefore important to examine risk preferences. In turn, this approach requires formal definitions of risk. A standard approach is to focus on the variance of reward distributions (variance-risk). In this study, we also examined a form of risk related to the skewness of reward distributions (skewness-risk). Thus, we tested the extent to which empirically derived utility functions predicted preferences for variance-risk and skewness-risk in macaques. The expected utilities calculated for various symmetrical and skewed gambles served to define formally the direction of stochastic dominance between gambles. In direct choices, the animals' preferences followed both second-order (variance) and third-order (skewness) stochastic dominance. Specifically, for gambles with different variance but identical expected values (EVs), the monkeys preferred high-variance gambles at low EVs and low-variance gambles at high EVs; in gambles with different skewness but identical EVs and variances, the animals preferred positively over symmetrical and negatively skewed gambles in a strongly transitive fashion. Thus, the utility functions predicted the animals' preferences for variance-risk and skewness-risk. Using these well-defined forms of risk, this study shows that monkeys' choices conform to the internal reward valuations suggested by their utility functions. This result implies a representation of utility in monkeys that accounts for both variance-risk and skewness-risk preferences.


Assuntos
Comportamento de Escolha/fisiologia , Tomada de Decisões/fisiologia , Macaca mulatta/fisiologia , Modelos Estatísticos , Animais , Modelos Logísticos , Macaca mulatta/psicologia , Masculino , Estimulação Luminosa , Recompensa , Assunção de Riscos
17.
J Neurosci ; 37(7): 1708-1720, 2017 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-28202786

RESUMO

Learning to optimally predict rewards requires agents to account for fluctuations in reward value. Recent work suggests that individuals can efficiently learn about variable rewards through adaptation of the learning rate, and coding of prediction errors relative to reward variability. Such adaptive coding has been linked to midbrain dopamine neurons in nonhuman primates, and evidence in support for a similar role of the dopaminergic system in humans is emerging from fMRI data. Here, we sought to investigate the effect of dopaminergic perturbations on adaptive prediction error coding in humans, using a between-subject, placebo-controlled pharmacological fMRI study with a dopaminergic agonist (bromocriptine) and antagonist (sulpiride). Participants performed a previously validated task in which they predicted the magnitude of upcoming rewards drawn from distributions with varying SDs. After each prediction, participants received a reward, yielding trial-by-trial prediction errors. Under placebo, we replicated previous observations of adaptive coding in the midbrain and ventral striatum. Treatment with sulpiride attenuated adaptive coding in both midbrain and ventral striatum, and was associated with a decrease in performance, whereas bromocriptine did not have a significant impact. Although we observed no differential effect of SD on performance between the groups, computational modeling suggested decreased behavioral adaptation in the sulpiride group. These results suggest that normal dopaminergic function is critical for adaptive prediction error coding, a key property of the brain thought to facilitate efficient learning in variable environments. Crucially, these results also offer potential insights for understanding the impact of disrupted dopamine function in mental illness.SIGNIFICANCE STATEMENT To choose optimally, we have to learn what to expect. Humans dampen learning when there is a great deal of variability in reward outcome, and two brain regions that are modulated by the brain chemical dopamine are sensitive to reward variability. Here, we aimed to directly relate dopamine to learning about variable rewards, and the neural encoding of associated teaching signals. We perturbed dopamine in healthy individuals using dopaminergic medication and asked them to predict variable rewards while we made brain scans. Dopamine perturbations impaired learning and the neural encoding of reward variability, thus establishing a direct link between dopamine and adaptation to reward variability. These results aid our understanding of clinical conditions associated with dopaminergic dysfunction, such as psychosis.


Assuntos
Adaptação Fisiológica/fisiologia , Corpo Estriado/metabolismo , Mesencéfalo/metabolismo , Adaptação Fisiológica/efeitos dos fármacos , Adulto , Bromocriptina/farmacologia , Simulação por Computador , Corpo Estriado/diagnóstico por imagem , Corpo Estriado/efeitos dos fármacos , Agonistas de Dopamina/farmacologia , Antagonistas de Dopamina/farmacologia , Método Duplo-Cego , Feminino , Testes Genéticos , Voluntários Saudáveis , Humanos , Processamento de Imagem Assistida por Computador , Masculino , Mesencéfalo/diagnóstico por imagem , Mesencéfalo/efeitos dos fármacos , Motivação/efeitos dos fármacos , Motivação/fisiologia , Oxigênio/sangue , Recompensa , Sulpirida/farmacologia , Adulto Jovem
19.
Exp Brain Res ; 236(6): 1679-1688, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29610950

RESUMO

Reward outcomes are available in many diverse situations and all involve choice. If there are multiple outcomes each rewarding, then decisions regarding relative value lead to choosing one over another. Important factors related to choice context should be encoded and utilized for this form of adaptive choosing. These factors can include the number of alternatives, the pacing of choice behavior and the possibility to reverse one's choice. An essential step in understanding if the context of choice is encoded is to directly compare choice with a context in which choice is absent. Neural activity in orbitofrontal cortex and striatum encodes potential value parameters related to reward quality and quantity as well as relative preference. We examined how neural activations in these brain regions are sensitive to choice situations and potentially involved in a prediction for the upcoming outcome selection. Neural activity was recorded and compared between a two-choice spatial delayed response task and an imperative 'one-option' task. Neural activity was obtained that extended from the instruction cue to the movement similar to previous work utilizing the identical imperative task. Orbitofrontal and striatal neural responses depended upon the decision about the choice of which reward to collect. Moreover, signals to predictive instruction cues that precede choice were selective for the choice situation. These neural responses could reflect chosen value with greater information on relative value of individual options as well as encode choice context itself embedded in the task as a part of the post-decision variable.


Assuntos
Comportamento de Escolha/fisiologia , Corpo Estriado/fisiologia , Eletroencefalografia/métodos , Macaca/fisiologia , Córtex Pré-Frontal/fisiologia , Desempenho Psicomotor/fisiologia , Recompensa , Animais , Núcleo Caudado/fisiologia , Eletrodos Implantados , Macaca fascicularis , Macaca mulatta , Microeletrodos , Núcleo Accumbens/fisiologia , Putamen/fisiologia
20.
J Neurosci ; 36(39): 10016-25, 2016 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-27683899

RESUMO

UNLABELLED: Given that the range of rewarding and punishing outcomes of actions is large but neural coding capacity is limited, efficient processing of outcomes by the brain is necessary. One mechanism to increase efficiency is to rescale neural output to the range of outcomes expected in the current context, and process only experienced deviations from this expectation. However, this mechanism comes at the cost of not being able to discriminate between unexpectedly low losses when times are bad versus unexpectedly high gains when times are good. Thus, too much adaptation would result in disregarding information about the nature and absolute magnitude of outcomes, preventing learning about the longer-term value structure of the environment. Here we investigate the degree of adaptation in outcome coding brain regions in humans, for directly experienced outcomes and observed outcomes. We scanned participants while they performed a social learning task in gain and loss blocks. Multivariate pattern analysis showed two distinct networks of brain regions adapt to the most likely outcomes within a block. Frontostriatal areas adapted to directly experienced outcomes, whereas lateral frontal and temporoparietal regions adapted to observed social outcomes. Critically, in both cases, adaptation was incomplete and information about whether the outcomes arose in a gain block or a loss block was retained. Univariate analysis confirmed incomplete adaptive coding in these regions but also detected nonadapting outcome signals. Thus, although neural areas rescale their responses to outcomes for efficient coding, they adapt incompletely and keep track of the longer-term incentives available in the environment. SIGNIFICANCE STATEMENT: Optimal value-based choice requires that the brain precisely and efficiently represents positive and negative outcomes. One way to increase efficiency is to adapt responding to the most likely outcomes in a given context. However, too strong adaptation would result in loss of precise representation (e.g., when the avoidance of a loss in a loss-context is coded the same as receipt of a gain in a gain-context). We investigated an intermediate form of adaptation that is efficient while maintaining information about received gains and avoided losses. We found that frontostriatal areas adapted to directly experienced outcomes, whereas lateral frontal and temporoparietal regions adapted to observed social outcomes. Importantly, adaptation was intermediate, in line with influential models of reference dependence in behavioral economics.


Assuntos
Adaptação Fisiológica/fisiologia , Córtex Cerebral/fisiologia , Comportamento de Escolha/fisiologia , Aprendizagem por Discriminação/fisiologia , Recompensa , Aprendizado Social/fisiologia , Adolescente , Adulto , Extinção Psicológica/fisiologia , Feminino , Humanos , Masculino , Memória/fisiologia , Rede Nervosa/fisiologia , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA