RESUMEN
Learning to make adaptive decisions involves making choices, assessing their consequence, and leveraging this assessment to attain higher rewarding states. Despite vast literature on value-based decision-making, relatively little is known about the cognitive processes underlying decisions in highly uncertain contexts. Real world decisions are rarely accompanied by immediate feedback, explicit rewards, or complete knowledge of the environment. Being able to make informed decisions in such contexts requires significant knowledge about the environment, which can only be gained via exploration. Here we aim at understanding and formalizing the brain mechanisms underlying these processes. To this end, we first designed and performed an experimental task. Human participants had to learn to maximize reward while making sequences of decisions with only basic knowledge of the environment, and in the absence of explicit performance cues. Participants had to rely on their own internal assessment of performance to reveal a covert relationship between their choices and their subsequent consequences to find a strategy leading to the highest cumulative reward. Our results show that the participants' reaction times were longer whenever the decision involved a future consequence, suggesting greater introspection whenever a delayed value had to be considered. The learning time varied significantly across participants. Second, we formalized the neurocognitive processes underlying decision-making within this task, combining mean-field representations of competing neural populations with a reinforcement learning mechanism. This model provided a plausible characterization of the brain dynamics underlying these processes, and reproduced each aspect of the participants' behavior, from their reaction times and choices to their learning rates. In summary, both the experimental results and the model provide a principled explanation to how delayed value may be computed and incorporated into the neural dynamics of decision-making, and to how learning occurs in these uncertain scenarios.
RESUMEN
Brain-computer interfaces have seen extraordinary surges in developments in recent years, and a significant discrepancy now exists between the abundance of available data and the limited headway made in achieving a unified theoretical framework. This discrepancy becomes particularly pronounced when examining the collective neural activity at the micro and meso scale, where a coherent formalization that adequately describes neural interactions is still lacking. Here, we introduce a mathematical framework to analyze systems of natural neurons and interpret the related empirical observations in terms of lattice field theory, an established paradigm from theoretical particle physics and statistical mechanics. Our methods are tailored to interpret data from chronic neural interfaces, especially spike rasters from measurements of single neuron activity, and generalize the maximum entropy model for neural networks so that the time evolution of the system is also taken into account. This is obtained by bridging particle physics and neuroscience, paving the way for particle physics-inspired models of the neocortex.
RESUMEN
The prefrontal cortex maintains information in memory through static or dynamic population codes depending on task demands, but whether the population coding schemes used are learning-dependent and differ between cell types is currently unknown. We investigate the population coding properties and temporal stability of neurons recorded from male macaques in two mapping tasks during and after stimulus-response associative learning, and then we use a Strategy task with the same stimuli and responses as control. We identify a heterogeneous population coding for stimuli, responses, and novel associations: static for putative pyramidal cells and dynamic for putative interneurons that show the strongest selectivity for all the variables. The population coding of learned associations shows overall the highest stability driven by cell types, with interneurons changing from dynamic to static coding after successful learning. The results support that prefrontal microcircuitry expresses mixed population coding governed by cell types and changes its stability during associative learning.
Asunto(s)
Neuronas , Corteza Prefrontal , Animales , Masculino , Corteza Prefrontal/fisiología , Neuronas/fisiología , Aprendizaje/fisiología , Células Piramidales/fisiología , Interneuronas/fisiología , MacacaRESUMEN
A vast amount of literature agrees that rank-ordered information as A>B>C>D>E>F is mentally represented in spatially organized schemas after learning. This organization significantly influences the process of decision-making, using the acquired premises, i.e. deciding if B is higher than D is equivalent to comparing their position in this space. The implementation of non-verbal versions of the transitive inference task has provided the basis for ascertaining that different animal species explore a mental space when deciding among hierarchically organized memories. In the present work, we reviewed several studies of transitive inference that highlighted this ability in animals and, consequently, the animal models developed to study the underlying cognitive processes and the main neural structures supporting this ability. Further, we present the literature investigating which are the underlying neuronal mechanisms. Then we discuss how non-human primates represent an excellent model for future studies, providing ideal resources for better understanding the neuronal correlates of decision-making through transitive inference tasks.
Asunto(s)
Aprendizaje , Neurofisiología , Animales , Haplorrinos , Aprendizaje/fisiología , Neuronas , Toma de DecisionesRESUMEN
Interaction with the environment requires us to predict the potential reward that will follow our choices. Rewards could change depending on the context and our behavior adapts accordingly. Previous studies have shown that, depending on reward regimes, actions can be facilitated (i.e., increasing the reward for response) or interfered (i.e., increasing the reward for suppression). Here we studied how the change in reward perspective can influence subjects' adaptation strategy. Students were asked to perform a modified version of the Stop-Signal task. Specifically, at the beginning of each trial, a Cue Signal informed subjects of the value of the reward they would receive; in one condition, Go Trials were rewarded more than Stop Trials, in another, Stop Trials were rewarded more than Go Trials, and in the last, both trials were rewarded equally. Subjects participated in a virtual competition, and the reward consisted of points to be earned to climb the leaderboard and win (as in a video game contest). The sum of points earned was updated with each trial. After a learning phase in which the three conditions were presented separately, each subject performed 600 trials testing phase in which the three conditions were randomly mixed. Based on the previous studies, we hypothesized that subjects could employ different strategies to perform the task, including modulating inhibition efficiency, adjusting response speed, or employing a constant behavior across contexts. We found that to perform the task, subjects preferentially employed a strategy-related speed of response adjustment, while the duration of the inhibition process did not change significantly across the conditions. The investigation of strategic motor adjustments to reward's prospect is relevant not only to understanding how action control is typically regulated, but also to work on various groups of patients who exhibit cognitive control deficits, suggesting that the ability to inhibit can be modulated by employing reward prospects as motivational factors.
RESUMEN
Goal-oriented actions often require the coordinated movement of two or more effectors. Sometimes multi-effector movements need to be adjusted according to a continuously changing environment, requiring stopping an effector without interrupting the movement of the others. This form of control has been investigated by the selective Stop Signal Task (SST), requiring the inhibition of an effector of a multicomponent action. This form of selective inhibition has been hypothesized to act through a two-step process, where a temporary global inhibition deactivating all the ongoing motor responses is followed by a restarting process that reactivates only the moving effector. When this form of inhibition takes place, the reaction time (RT) of the moving effector pays the cost of the previous global inhibition. However, it is poorly investigated if and how this cost delays the RT of the effector that was required to be stopped but was erroneously moved (Stop Error trials). Here we measure the Stop Error RT in a group of participants instructed to simultaneously rotate the wrist and lift the foot when a Go Signal occurred, and interrupt both movements (non-selective Stop version) or only one of them (selective Stop version) when a Stop Signal was presented. We presented this task in two experimental conditions to evaluate how different contexts can influence a possible proactive inhibition on the RT of the moving effector in the selective Stop versions. In one context, we provided the foreknowledge of the effector to be inhibited by presenting the same selective or non-selective Stop versions in the same block of trials. In a different context, while providing no foreknowledge of the effector(s) to be stopped, the selective and non-selective Stop versions were intermingled, and the information on the effector to be stopped was delivered at the time of the Stop Signal presentation. We detected a cost in both Correct and Error selective Stop RTs that was influenced by the different task conditions. Results are discussed within the framework of the race model related to the SST, and its relationship with a restart model developed for selective versions of this paradigm.