Search | VHL Regional Portal

Computational Mechanisms Underlying Motivation to Earn Symbolic Reinforcers.

Burk, Diana C; Taswell, Craig; Tang, Hua; Averbeck, Bruno B.

J Neurosci ; 44(24)2024 Jun 12.

Article in English | MEDLINE | ID: mdl-38670805

ABSTRACT

Reinforcement learning is a theoretical framework that describes how agents learn to select options that maximize rewards and minimize punishments over time. We often make choices, however, to obtain symbolic reinforcers (e.g., money, points) that are later exchanged for primary reinforcers (e.g., food, drink). Although symbolic reinforcers are ubiquitous in our daily lives, widely used in laboratory tasks because they can be motivating, mechanisms by which they become motivating are less understood. In the present study, we examined how monkeys learn to make choices that maximize fluid rewards through reinforcement with tokens. The question addressed here is how the value of a state, which is a function of multiple task features (e.g., the current number of accumulated tokens, choice options, task epoch, trials since the last delivery of primary reinforcer, etc.), drives value and affects motivation. We constructed a Markov decision process model that computes the value of task states given task features to then correlate with the motivational state of the animal. Fixation times, choice reaction times, and abort frequency were all significantly related to values of task states during the tokens task (n = 5 monkeys, three males and two females). Furthermore, the model makes predictions for how neural responses could change on a moment-by-moment basis relative to changes in the state value. Together, this task and model allow us to capture learning and behavior related to symbolic reinforcement.

Subject(s)

Choice Behavior , Macaca mulatta , Motivation , Reinforcement, Psychology , Reward , Animals , Motivation/physiology , Male , Choice Behavior/physiology , Reaction Time/physiology , Markov Chains , Female

Cross-species real time detection of trends in pupil size fluctuation.

Kronemer, Sharif I; Gobo, Victoria E; Teves, Joshua B; Burk, Diana C; Shahsavarani, Somayeh; Walsh, Catherine R; Gonzalez-Castillo, Javier; Bandettini, Peter A.

bioRxiv ; 2024 Feb 14.

Article in English | MEDLINE | ID: mdl-38410482

ABSTRACT

Pupillometry is a popular method because pupil size is an easily measured and sensitive marker of neural activity and associated with behavior, cognition, emotion, and perception. Currently, there is no method for monitoring the phases of pupillary fluctuation in real time. We introduce rtPupilPhase - a software that automatically detects trends in pupil size in real time, enabling novel implementations of real time pupillometry towards achieving numerous research and translational goals.

Computational mechanisms underlying motivation to earn symbolic reinforcers.

Burk, Diana C; Taswell, Craig; Tang, Hua; Averbeck, Bruno B.

bioRxiv ; 2023 Oct 11.

Article in English | MEDLINE | ID: mdl-37873311

ABSTRACT

Reinforcement learning (RL) is a theoretical framework that describes how agents learn to select options that maximize rewards and minimize punishments over time. We often make choices, however, to obtain symbolic reinforcers (e.g. money, points) that can later be exchanged for primary reinforcers (e.g. food, drink). Although symbolic reinforcers are motivating, little is understood about the neural or computational mechanisms underlying the motivation to earn them. In the present study, we examined how monkeys learn to make choices that maximize fluid rewards through reinforcement with tokens. The question addressed here is how the value of a state, which is a function of multiple task features (e.g. current number of accumulated tokens, choice options, task epoch, trials since last delivery of primary reinforcer, etc.), drives value and affects motivation. We constructed a Markov decision process model that computes the value of task states given task features to capture the motivational state of the animal. Fixation times, choice reaction times, and abort frequency were all significantly related to values of task states during the tokens task (n=5 monkeys). Furthermore, the model makes predictions for how neural responses could change on a moment-by-moment basis relative to changes in state value. Together, this task and model allow us to capture learning and behavior related to symbolic reinforcement.

Environmental uncertainty and the advantage of impulsive choice strategies.

Burk, Diana C; Averbeck, Bruno B.

PLoS Comput Biol ; 19(1): e1010873, 2023 01.

Article in English | MEDLINE | ID: mdl-36716320

ABSTRACT

Choice impulsivity is characterized by the choice of immediate, smaller reward options over future, larger reward options, and is often thought to be associated with negative life outcomes. However, some environments make future rewards more uncertain, and in these environments impulsive choices can be beneficial. Here we examined the conditions under which impulsive vs. non-impulsive decision strategies would be advantageous. We used Markov Decision Processes (MDPs) to model three common decision-making tasks: Temporal Discounting, Information Sampling, and an Explore-Exploit task. We manipulated environmental variables to create circumstances where future outcomes were relatively uncertain. We then manipulated the discount factor of an MDP agent, which affects the value of immediate versus future rewards, to model impulsive and non-impulsive behavior. This allowed us to examine the performance of impulsive and non-impulsive agents in more or less predictable environments. In Temporal Discounting, we manipulated the transition probability to delayed rewards and found that the agent with the lower discount factor (i.e. the impulsive agent) collected more average reward than the agent with a higher discount factor (the non-impulsive agent) by selecting immediate reward options when the probability of receiving the future reward was low. In the Information Sampling task, we manipulated the amount of information obtained with each sample. When sampling led to small information gains, the impulsive MDP agent collected more average reward than the non-impulsive agent. Third, in the Explore-Exploit task, we manipulated the substitution rate for novel options. When the substitution rate was high, the impulsive agent again performed better than the non-impulsive agent, as it explored the novel options less and instead exploited options with known reward values. The results of these analyses show that impulsivity can be advantageous in environments that are unexpectedly uncertain.

Subject(s)

Impulsive Behavior , Reward , Uncertainty , Probability , Markov Chains , Choice Behavior

Neurons in inferior temporal cortex are sensitive to motion trajectory during degraded object recognition.

Burk, Diana C; Sheinberg, David L.

Cereb Cortex Commun ; 3(3): tgac034, 2022.

Article in English | MEDLINE | ID: mdl-36168516

ABSTRACT

Our brains continuously acquire sensory information and make judgments even when visual information is limited. In some circumstances, an ambiguous object can be recognized from how it moves, such as an animal hopping or a plane flying overhead. Yet it remains unclear how movement is processed by brain areas involved in visual object recognition. Here we investigate whether inferior temporal (IT) cortex, an area known for its relevance in visual form processing, has access to motion information during recognition. We developed a matching task that required monkeys to recognize moving shapes with variable levels of shape degradation. Neural recordings in area IT showed that, surprisingly, some IT neurons responded stronger to degraded shapes than clear ones. Furthermore, neurons exhibited motion sensitivity at different times during the presentation of the blurry target. Population decoding analyses showed that motion patterns could be decoded from IT neuron pseudo-populations. Contrary to previous findings, these results suggest that neurons in IT can integrate visual motion and shape information, particularly when shape information is degraded, in a way that has been previously overlooked. Our results highlight the importance of using challenging multifeature recognition tasks to understand the role of area IT in naturalistic visual object recognition.

The Monitoring and Control of Task Sequences in Human and Non-Human Primates.

Desrochers, Theresa M; Burk, Diana C; Badre, David; Sheinberg, David L.

Front Syst Neurosci ; 9: 185, 2015.

Article in English | MEDLINE | ID: mdl-26834581

ABSTRACT

Our ability to plan and execute a series of tasks leading to a desired goal requires remarkable coordination between sensory, motor, and decision-related systems. Prefrontal cortex (PFC) is thought to play a central role in this coordination, especially when actions must be assembled extemporaneously and cannot be programmed as a rote series of movements. A central component of this flexible behavior is the moment-by-moment allocation of working memory and attention. The ubiquity of sequence planning in our everyday lives belies the neural complexity that supports this capacity, and little is known about how frontal cortical regions orchestrate the monitoring and control of sequential behaviors. For example, it remains unclear if and how sensory cortical areas, which provide essential driving inputs for behavior, are modulated by the frontal cortex during these tasks. Here, we review what is known about moment-to-moment monitoring as it relates to visually guided, rule-driven behaviors that change over time. We highlight recent human work that shows how the rostrolateral prefrontal cortex (RLPFC) participates in monitoring during task sequences. Neurophysiological data from monkeys suggests that monitoring may be accomplished by neurons that respond to items within the sequence and may in turn influence the tuning properties of neurons in posterior sensory areas. Understanding the interplay between proceduralized or habitual acts and supervised control of sequences is key to our understanding of sequential task execution. A crucial bridge will be the use of experimental protocols that allow for the examination of the functional homology between monkeys and humans. We illustrate how task sequences may be parceled into components and examined experimentally, thereby opening future avenues of investigation into the neural basis of sequential monitoring and control.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL