RESUMO
Humans have the fascinating ability to achieve goals in a complex and constantly changing world, still surpassing modern machine-learning algorithms in terms of flexibility and learning speed. It is generally accepted that a crucial factor for this ability is the use of abstract, hierarchical representations, which employ structure in the environment to guide learning and decision making. Nevertheless, how we create and use these hierarchical representations is poorly understood. This study presents evidence that human behavior can be characterized as hierarchical reinforcement learning (RL). We designed an experiment to test specific predictions of hierarchical RL using a series of subtasks in the realm of context-based learning and observed several behavioral markers of hierarchical RL, such as asymmetric switch costs between changes in higher-level versus lower-level features, faster learning in higher-valued compared to lower-valued contexts, and preference for higher-valued compared to lower-valued contexts. We replicated these results across three independent samples. We simulated three models-a classic RL, a hierarchical RL, and a hierarchical Bayesian model-and compared their behavior to human results. While the flat RL model captured some aspects of participants' sensitivity to outcome values, and the hierarchical Bayesian model captured some markers of transfer, only hierarchical RL accounted for all patterns observed in human behavior. This work shows that hierarchical RL, a biologically inspired and computationally simple algorithm, can capture human behavior in complex, hierarchical environments and opens the avenue for future research in this field.
Assuntos
Aprendizado de Máquina , Modelos Psicológicos , Reforço Psicológico , Adolescente , Adulto , Teorema de Bayes , Feminino , Humanos , Curva de Aprendizado , Masculino , Adulto JovemRESUMO
In the real world, many relationships between events are uncertain and probabilistic. Uncertainty is also likely to be a more common feature of daily experience for youth because they have less experience to draw from than adults. Some studies suggest probabilistic learning may be inefficient in youths compared to adults, while others suggest it may be more efficient in youths in mid adolescence. Here we used a probabilistic reinforcement learning task to test how youth age 8-17 (N = 187) and adults age 18-30 (N = 110) learn about stable probabilistic contingencies. Performance increased with age through early-twenties, then stabilized. Using hierarchical Bayesian methods to fit computational reinforcement learning models, we show that all participants' performance was better explained by models in which negative outcomes had minimal to no impact on learning. The performance increase over age was driven by 1) an increase in learning rate (i.e. decrease in integration time scale); 2) a decrease in noisy/exploratory choices. In mid-adolescence age 13-15, salivary testosterone and learning rate were positively related. We discuss our findings in the context of other studies and hypotheses about adolescent brain development.
Assuntos
Modelos Psicológicos , Psicologia do Adolescente , Reforço Psicológico , Adolescente , Adulto , Criança , Biologia Computacional , Feminino , Humanos , Aprendizagem/fisiologia , Masculino , Saliva/química , Testosterona/análise , Adulto JovemRESUMO
Humans are learning agents that acquire social group representations from experience. Here, we discuss how to construct artificial agents capable of this feat. One approach, based on deep reinforcement learning, allows the necessary representations to self-organize. This minimizes the need for hand-engineering, improving robustness and scalability. It also enables "virtual neuroscience" research on the learned representations.
Assuntos
Aprendizagem , Neurociências , HumanosRESUMO
Dopamine release in the nucleus accumbens has been hypothesized to signal reward prediction error, the difference between observed and predicted reward, suggesting a biological implementation for reinforcement learning. Rigorous tests of this hypothesis require assumptions about how the brain maps sensory signals to reward predictions, yet this mapping is still poorly understood. In particular, the mapping is non-trivial when sensory signals provide ambiguous information about the hidden state of the environment. Previous work using classical conditioning tasks has suggested that reward predictions are generated conditional on probabilistic beliefs about the hidden state, such that dopamine implicitly reflects these beliefs. Here we test this hypothesis in the context of an instrumental task (a two-armed bandit), where the hidden state switches repeatedly. We measured choice behavior and recorded dLight signals reflecting dopamine release in the nucleus accumbens core. Model comparison based on the behavioral data favored models that used Bayesian updating of probabilistic beliefs. These same models also quantitatively matched the dopamine measurements better than non-Bayesian alternatives. We conclude that probabilistic belief computation plays a fundamental role in instrumental performance and associated mesolimbic dopamine signaling.
RESUMO
During adolescence, youth venture out, explore the wider world, and are challenged to learn how to navigate novel and uncertain environments. We investigated how performance changes across adolescent development in a stochastic, volatile reversal-learning task that uniquely taxes the balance of persistence and flexibility. In a sample of 291 participants aged 8-30, we found that in the mid-teen years, adolescents outperformed both younger and older participants. We developed two independent cognitive models, based on Reinforcement learning (RL) and Bayesian inference (BI). The RL parameter for learning from negative outcomes and the BI parameters specifying participants' mental models were closest to optimal in mid-teen adolescents, suggesting a central role in adolescent cognitive processing. By contrast, persistence and noise parameters improved monotonically with age. We distilled the insights of RL and BI using principal component analysis and found that three shared components interacted to form the adolescent performance peak: adult-like behavioral quality, child-like time scales, and developmentally-unique processing of positive feedback. This research highlights adolescence as a neurodevelopmental window that can create performance advantages in volatile and uncertain environments. It also shows how detailed insights can be gleaned by using cognitive models in new ways.
Assuntos
Atenção , Reforço Psicológico , Adolescente , Desenvolvimento do Adolescente , Adulto , Teorema de Bayes , Humanos , Reversão de AprendizagemRESUMO
Humans have the astonishing capacity to quickly adapt to varying environmental demands and reach complex goals in the absence of extrinsic rewards. Part of what underlies this capacity is the ability to flexibly reuse and recombine previous experiences, and to plan future courses of action in a psychological space that is shaped by these experiences. Decades of research have suggested that humans use hierarchical representations for efficient planning and flexibility, but the origin of these representations has remained elusive. This study investigates how 73 participants learned hierarchical representations through experience, in a task in which they had to perform complex action sequences to obtain rewards. Complex action sequences were composed of simpler action sequences, which were not rewarded, but whose completion was signaled to participants. We investigated the process with which participants learned to perform simpler action sequences and combined them into complex action sequences. After learning action sequences, participants completed a transfer phase in which either simple sequences or complex sequences were manipulated without notice. Relearning progressed slower when simple than complex sequences were changed, in accordance with a hierarchical representations in which lower levels are quickly consolidated, potentially stabilizing exploration, while higher levels remain malleable, with benefits for flexible recombination.
RESUMO
Reinforcement learning (RL) is a concept that has been invaluable to fields including machine learning, neuroscience, and cognitive science. However, what RL entails differs between fields, leading to difficulties when interpreting and translating findings. After laying out these differences, this paper focuses on cognitive (neuro)science to discuss how we as a field might over-interpret RL modeling results. We too often assume-implicitly-that modeling results generalize between tasks, models, and participant populations, despite negative empirical evidence for this assumption. We also often assume that parameters measure specific, unique (neuro)cognitive processes, a concept we call interpretability, when evidence suggests that they capture different functions across studies and tasks. We conclude that future computational research needs to pay increased attention to implicit assumptions when using RL models, and suggest that a more systematic understanding of contextual factors will help address issues and improve the ability of RL to explain brain and behavior.
RESUMO
Multiple neurocognitive systems contribute simultaneously to learning. For example, dopamine and basal ganglia (BG) systems are thought to support reinforcement learning (RL) by incrementally updating the value of choices, while the prefrontal cortex (PFC) contributes different computations, such as actively maintaining precise information in working memory (WM). It is commonly thought that WM and PFC show more protracted development than RL and BG systems, yet their contributions are rarely assessed in tandem. Here, we used a simple learning task to test how RL and WM contribute to changes in learning across adolescence. We tested 187 subjects ages 8 to 17 and 53 adults (25-30). Participants learned stimulus-action associations from feedback; the learning load was varied to be within or exceed WM capacity. Participants age 8-12 learned slower than participants age 13-17, and were more sensitive to load. We used computational modeling to estimate subjects' use of WM and RL processes. Surprisingly, we found more protracted changes in RL than WM during development. RL learning rate increased with age until age 18 and WM parameters showed more subtle, gender- and puberty-dependent changes early in adolescence. These results can inform education and intervention strategies based on the developmental science of learning.
Assuntos
Aprendizagem/fisiologia , Reforço Psicológico , Adolescente , Feminino , Humanos , Masculino , Memória de Curto Prazo/fisiologiaRESUMO
Inductive reasoning, which entails reaching conclusions that are based on but go beyond available evidence, has long been of interest in cognitive science. Nevertheless, knowledge is still lacking as to the specific cognitive processes that underlie inductive reasoning. Here, we shed light on these processes in two ways. First, we characterized the timecourse of inductive reasoning in a rule induction task, using pupil dilation as a moment-by-moment measure of cognitive load. Participants' patterns of behavior and pupillary responses indicated that they engaged in rule inference on-line, and were surprised when additional evidence violated their inferred rules. Second, we sought to gain insight into how participants represented rules on this task - specifically, whether they would structure the rules hierarchically when possible. We predicted the cognitive load imposed by hierarchical representations, as well as by non-hierarchical, flat ones. We used task-evoked pupil dilation as a metric of cognitive load to infer, based on these predictions, which participants represented rules with flat or hierarchical structures. Participants categorized as representing the rules hierarchically or flat differed in task performance and self-reports of strategy. Hierarchical rule representation was associated with more efficient performance and more pronounced pupillary responses to rule violations on trials that afford a higher-order regularity, but with less efficient performance on trials that do not. Thus, differences in rule representation can be inferred from a physiological measure of cognitive load, and are associated with differences in performance. These results illustrate how pupillometry can provide a window into reasoning as it unfolds over time.
Assuntos
Aprendizagem/fisiologia , Desempenho Psicomotor/fisiologia , Pupila/fisiologia , Tempo de Reação/fisiologia , Pensamento/fisiologia , Adolescente , Adulto , Feminino , Humanos , Individualidade , Masculino , Fatores de Tempo , Adulto JovemRESUMO
This review provides an introduction to two eyetracking measures that can be used to study cognitive development and plasticity: pupil dilation and spontaneous blink rate. We begin by outlining the rich history of gaze analysis, which can reveal the current focus of attention as well as cognitive strategies. We then turn to the two lesser-utilized ocular measures. Pupil dilation is modulated by the brain's locus coeruleus-norepinephrine system, which controls physiological arousal and attention, and has been used as a measure of subjective task difficulty, mental effort, and neural gain. Spontaneous eyeblink rate correlates with levels of dopamine in the central nervous system, and can reveal processes underlying learning and goal-directed behavior. Taken together, gaze, pupil dilation, and blink rate are three non-invasive and complementary measures of cognition with high temporal resolution and well-understood neural foundations. Here we review the neural foundations of pupil dilation and blink rate, provide examples of their usage, describe analytic methods and methodological considerations, and discuss their potential for research on learning, cognitive development, and plasticity.