Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 157
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 183(6): 1600-1616.e25, 2020 12 10.
Artigo em Inglês | MEDLINE | ID: mdl-33248024

RESUMO

Rapid phasic activity of midbrain dopamine neurons is thought to signal reward prediction errors (RPEs), resembling temporal difference errors used in machine learning. However, recent studies describing slowly increasing dopamine signals have instead proposed that they represent state values and arise independent from somatic spiking activity. Here we developed experimental paradigms using virtual reality that disambiguate RPEs from values. We examined dopamine circuit activity at various stages, including somatic spiking, calcium signals at somata and axons, and striatal dopamine concentrations. Our results demonstrate that ramping dopamine signals are consistent with RPEs rather than value, and this ramping is observed at all stages examined. Ramping dopamine signals can be driven by a dynamic stimulus that indicates a gradual approach to a reward. We provide a unified computational understanding of rapid phasic and slowly ramping dopamine signals: dopamine neurons perform a derivative-like computation over values on a moment-by-moment basis.


Assuntos
Dopamina/metabolismo , Transdução de Sinais , Potenciais de Ação/fisiologia , Animais , Axônios/metabolismo , Cálcio/metabolismo , Sinalização do Cálcio , Corpo Celular/metabolismo , Sinais (Psicologia) , Neurônios Dopaminérgicos/fisiologia , Fluorometria , Masculino , Camundongos Endogâmicos C57BL , Modelos Neurológicos , Estimulação Luminosa , Recompensa , Sensação , Fatores de Tempo , Área Tegmentar Ventral/metabolismo , Realidade Virtual
2.
Proc Natl Acad Sci U S A ; 120(22): e2215015120, 2023 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-37216526

RESUMO

Teaching enables humans to impart vast stores of culturally specific knowledge and skills. However, little is known about the neural computations that guide teachers' decisions about what information to communicate. Participants (N = 28) played the role of teachers while being scanned using fMRI; their task was to select examples that would teach learners how to answer abstract multiple-choice questions. Participants' examples were best described by a model that selects evidence that maximizes the learner's belief in the correct answer. Consistent with this idea, participants' predictions about how well learners would do closely tracked the performance of an independent sample of learners (N = 140) who were tested on the examples they had provided. In addition, regions that play specialized roles in processing social information, namely the bilateral temporoparietal junction and middle and dorsal medial prefrontal cortex, tracked learners' posterior belief in the correct answer. Our results shed light on the computational and neural architectures that support our extraordinary abilities as teachers.


Assuntos
Aprendizagem , Mentalização , Ensino , Humanos , Encéfalo/diagnóstico por imagem
3.
PLoS Comput Biol ; 20(4): e1012057, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38669280

RESUMO

Policy compression is a computational framework that describes how capacity-limited agents trade reward for simpler action policies to reduce cognitive cost. In this study, we present behavioral evidence that humans prefer simpler policies, as predicted by a capacity-limited reinforcement learning model. Across a set of tasks, we find that people exploit structure in the relationships between states, actions, and rewards to "compress" their policies. In particular, compressed policies are systematically biased towards actions with high marginal probability, thereby discarding some state information. This bias is greater when there is redundancy in the reward-maximizing action policy across states, and increases with memory load. These results could not be explained qualitatively or quantitatively by models that did not make use of policy compression under a capacity limit. We also confirmed the prediction that time pressure should further reduce policy complexity and increase action bias, based on the hypothesis that actions are selected via time-dependent decoding of a compressed code. These findings contribute to a deeper understanding of how humans adapt their decision-making strategies under cognitive resource constraints.


Assuntos
Tomada de Decisões , Recompensa , Humanos , Tomada de Decisões/fisiologia , Biologia Computacional , Masculino , Adulto , Feminino , Reforço Psicológico , Modelos Psicológicos , Adulto Jovem , Cognição/fisiologia
4.
J Neurosci ; 43(3): 447-457, 2023 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-36639891

RESUMO

The matching law describes the tendency of agents to match the ratio of choices allocated to the ratio of rewards received when choosing among multiple options (Herrnstein, 1961). Perfect matching, however, is infrequently observed. Instead, agents tend to undermatch or bias choices toward the poorer option. Overmatching, or the tendency to bias choices toward the richer option, is rarely observed. Despite the ubiquity of undermatching, it has received an inadequate normative justification. Here, we assume agents not only seek to maximize reward, but also seek to minimize cognitive cost, which we formalize as policy complexity (the mutual information between actions and states of the environment). Policy complexity measures the extent to which the policy of an agent is state dependent. Our theory states that capacity-constrained agents (i.e., agents that must compress their policies to reduce complexity) can only undermatch or perfectly match, but not overmatch, consistent with the empirical evidence. Moreover, using mouse behavioral data (male), we validate a novel prediction about which task conditions exaggerate undermatching. Finally, in patients with Parkinson's disease (male and female), we argue that a reduction in undermatching with higher dopamine levels is consistent with an increased policy complexity.SIGNIFICANCE STATEMENT The matching law describes the tendency of agents to match the ratio of choices allocated to different options to the ratio of reward received. For example, if option a yields twice as much reward as option b, matching states that agents will choose option a twice as much. However, agents typically undermatch: they choose the poorer option more frequently than expected. Here, we assume that agents seek to simultaneously maximize reward and minimize the complexity of their action policies. We show that this theory explains when and why undermatching occurs. Neurally, we show that policy complexity, and by extension undermatching, is controlled by tonic dopamine, consistent with other evidence that dopamine plays an important role in cognitive resource allocation.


Assuntos
Dopamina , Doença de Parkinson , Masculino , Feminino , Animais , Camundongos , Recompensa , Doença de Parkinson/psicologia , Tomada de Decisões
5.
Nat Rev Neurosci ; 20(11): 703-714, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31570826

RESUMO

Midbrain dopamine signals are widely thought to report reward prediction errors that drive learning in the basal ganglia. However, dopamine has also been implicated in various probabilistic computations, such as encoding uncertainty and controlling exploration. Here, we show how these different facets of dopamine signalling can be brought together under a common reinforcement learning framework. The key idea is that multiple sources of uncertainty impinge on reinforcement learning computations: uncertainty about the state of the environment, the parameters of the value function and the optimal action policy. Each of these sources plays a distinct role in the prefrontal cortex-basal ganglia circuit for reinforcement learning and is ultimately reflected in dopamine activity. The view that dopamine plays a central role in the encoding and updating of beliefs brings the classical prediction error theory into alignment with more recent theories of Bayesian reinforcement learning.


Assuntos
Gânglios da Base/metabolismo , Dopamina/metabolismo , Aprendizagem/fisiologia , Rede Nervosa/metabolismo , Córtex Pré-Frontal/metabolismo , Animais , Humanos
6.
PLoS Comput Biol ; 19(9): e1011067, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37695776

RESUMO

To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.


Assuntos
Aprendizagem , Reforço Psicológico , Animais , Teorema de Bayes , Recompensa , Redes Neurais de Computação
7.
Cogn Psychol ; 150: 101653, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38503178

RESUMO

In order to efficiently divide labor with others, it is important to understand what our collaborators can do (i.e., their competence). However, competence is not static-people get better at particular jobs the more often they perform them. This plasticity of competence creates a challenge for collaboration: For example, is it better to assign tasks to whoever is most competent now, or to the person who can be trained most efficiently "on-the-job"? We conducted four experiments (N=396) that examine how people make decisions about whom to train (Experiments 1 and 3) and whom to recruit (Experiments 2 and 4) to a collaborative task, based on the simulated collaborators' starting expertise, the training opportunities available, and the goal of the task. We found that participants' decisions were best captured by a planning model that attempts to maximize the returns from collaboration while minimizing the costs of hiring and training individual collaborators. This planning model outperformed alternative models that based these decisions on the agents' current competence, or on how much agents stood to improve in a single training step, without considering whether this training would enable agents to succeed at the task in the long run. Our findings suggest that people do not recruit and train collaborators based solely on their current competence, nor solely on the opportunities for their collaborators to improve. Instead, people use an intuitive theory of competence to balance the costs of hiring and training others against the benefits to the collaboration.

8.
Biol Cybern ; 118(1-2): 1-5, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38337064

RESUMO

Neuroscience and artificial intelligence (AI) share a long, intertwined history. It has been argued that discoveries in neuroscience were (and continue to be) instrumental in driving the development of new AI technology. Scrutinizing these historical claims yields a more nuanced story, where AI researchers were loosely inspired by the brain, but ideas flowed mostly in the other direction.


Assuntos
Inteligência Artificial , Encéfalo , Neurociências , Humanos , Encéfalo/fisiologia , Neurociências/tendências , História do Século XX , Animais , História do Século XXI
9.
J Cogn Neurosci ; 35(9): 1508-1520, 2023 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-37382476

RESUMO

Exploration is an important part of decision making and is crucial to maximizing long-term rewards. Past work has shown that people use different forms of uncertainty to guide exploration. In this study, we investigate the role of the pupil-linked arousal system in uncertainty-guided exploration. We measured participants' (n = 48) pupil dilation while they performed a two-armed bandit task. Consistent with previous work, we found that people adopted a hybrid of directed, random, and undirected exploration, which are sensitive to relative uncertainty, total uncertainty, and value difference between options, respectively. We also found a positive correlation between pupil size and total uncertainty. Furthermore, augmenting the choice model with subject-specific total uncertainty estimates decoded from the pupil size improved predictions of held-out choices, suggesting that people used the uncertainty estimate encoded in pupil size to decide which option to explore. Together, the data shed light on the computations underlying uncertainty-driven exploration. Under the assumption that pupil size reflects locus coeruleus-norepinephrine neuromodulatory activity, these results also extend the theory of the locus coeruleus-norepinephrine function in exploration, highlighting its selective role in driving uncertainty-guided random exploration.


Assuntos
Nível de Alerta , Pupila , Humanos , Incerteza , Recompensa , Norepinefrina
10.
Cogn Affect Behav Neurosci ; 23(3): 465-475, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36168079

RESUMO

Can you reduce uncertainty by thinking? Intuition suggests that this happens through the elusive process of attention: if we expend mental effort, we can increase the reliability of our sensory data. Models based on "rational inattention" formalize this idea in terms of a trade-off between the costs and benefits of attention. This paper surveys the origin of these models in economics, their connection to rate-distortion theory, and some of their recent applications to psychology and neuroscience. We also report new data from a numerosity judgment task in which we manipulate performance incentives. Consistent with rational inattention, people are able to improve performance on this task when incentivized, in part by increasing the reliability of their sensory data.


Assuntos
Intuição , Julgamento , Humanos , Incerteza , Reprodutibilidade dos Testes , Atenção
11.
Proc Natl Acad Sci U S A ; 117(39): 24581-24589, 2020 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-32938799

RESUMO

In the real world, complex dynamic scenes often arise from the composition of simpler parts. The visual system exploits this structure by hierarchically decomposing dynamic scenes: When we see a person walking on a train or an animal running in a herd, we recognize the individual's movement as nested within a reference frame that is, itself, moving. Despite its ubiquity, surprisingly little is understood about the computations underlying hierarchical motion perception. To address this gap, we developed a class of stimuli that grant tight control over statistical relations among object velocities in dynamic scenes. We first demonstrate that structured motion stimuli benefit human multiple object tracking performance. Computational analysis revealed that the performance gain is best explained by human participants making use of motion relations during tracking. A second experiment, using a motion prediction task, reinforced this conclusion and provided fine-grained information about how the visual system flexibly exploits motion structure.


Assuntos
Percepção de Movimento , Percepção Visual , Adulto , Sequência de Bases , Feminino , Humanos , Masculino , Movimento , Adulto Jovem
12.
Proc Natl Acad Sci U S A ; 117(23): 12750-12755, 2020 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-32461363

RESUMO

In many real-life decisions, options are distributed in space and time, making it necessary to search sequentially through them, often without a chance to return to a rejected option. The optimal strategy in these tasks is to choose the first option that is above a threshold that depends on the current position in the sequence. The implicit decision-making strategies by humans vary but largely diverge from this optimal strategy. The reasons for this divergence remain unknown. We present a model of human stopping decisions in sequential decision-making tasks based on a linear threshold heuristic. The first two studies demonstrate that the linear threshold model accounts better for sequential decision making than existing models. Moreover, we show that the model accurately predicts participants' search behavior in different environments. In the third study, we confirm that the model generalizes to a real-world problem, thus providing an important step toward understanding human sequential decision making.


Assuntos
Tomada de Decisões , Modelos Psicológicos , Adolescente , Adulto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
13.
J Neurosci ; 41(32): 6892-6904, 2021 08 11.
Artigo em Inglês | MEDLINE | ID: mdl-34244363

RESUMO

Attributing outcomes to your own actions or to external causes is essential for appropriately learning which actions lead to reward and which actions do not. Our previous work showed that this type of credit assignment is best explained by a Bayesian reinforcement learning model which posits that beliefs about the causal structure of the environment modulate reward prediction errors (RPEs) during action value updating. In this study, we investigated the brain networks underlying reinforcement learning that are influenced by causal beliefs using functional magnetic resonance imaging while human participants (n = 31; 13 males, 18 females) completed a behavioral task that manipulated beliefs about causal structure. We found evidence that RPEs modulated by causal beliefs are represented in dorsal striatum, while standard (unmodulated) RPEs are represented in ventral striatum. Further analyses revealed that beliefs about causal structure are represented in anterior insula and inferior frontal gyrus. Finally, structural equation modeling revealed effective connectivity from anterior insula to dorsal striatum. Together, these results are consistent with a possible neural architecture in which causal beliefs in anterior insula are integrated with prediction error signals in dorsal striatum to update action values.SIGNIFICANCE STATEMENT Learning which actions lead to reward-a process known as reinforcement learning-is essential for survival. Inferring the causes of observed outcomes-a process known as causal inference-is crucial for appropriately assigning credit to one's own actions and restricting learning to effective action-outcome contingencies. Previous studies have linked reinforcement learning to the striatum, and causal inference to prefrontal regions, yet how these neural processes interact to guide adaptive behavior remains poorly understood. Here, we found evidence that causal beliefs represented in the prefrontal cortex modulate action value updating in posterior striatum, separately from the unmodulated action value update in ventral striatum posited by standard reinforcement learning models.


Assuntos
Encéfalo/fisiologia , Aprendizagem/fisiologia , Reforço Psicológico , Recompensa , Adolescente , Teorema de Bayes , Mapeamento Encefálico/métodos , Feminino , Humanos , Imageamento por Ressonância Magnética/métodos , Masculino , Rede Nervosa/fisiologia , Adulto Jovem
14.
PLoS Comput Biol ; 17(3): e1008659, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33760806

RESUMO

Slow-timescale (tonic) changes in dopamine (DA) contribute to a wide variety of processes in reinforcement learning, interval timing, and other domains. Furthermore, changes in tonic DA exert distinct effects depending on when they occur (e.g., during learning vs. performance) and what task the subject is performing (e.g., operant vs. classical conditioning). Two influential theories of tonic DA-the average reward theory and the Bayesian theory in which DA controls precision-have each been successful at explaining a subset of empirical findings. But how the same DA signal performs two seemingly distinct functions without creating crosstalk is not well understood. Here we reconcile the two theories under the unifying framework of 'rational inattention,' which (1) conceptually links average reward and precision, (2) outlines how DA manipulations affect this relationship, and in so doing, (3) captures new empirical phenomena. In brief, rational inattention asserts that agents can increase their precision in a task (and thus improve their performance) by paying a cognitive cost. Crucially, whether this cost is worth paying depends on average reward availability, reported by DA. The monotonic relationship between average reward and precision means that the DA signal contains the information necessary to retrieve the precision. When this information is needed after the task is performed, as presumed by Bayesian inference, acute manipulations of DA will bias behavior in predictable ways. We show how this framework reconciles a remarkably large collection of experimental findings. In reinforcement learning, the rational inattention framework predicts that learning from positive and negative feedback should be enhanced in high and low DA states, respectively, and that DA should tip the exploration-exploitation balance toward exploitation. In interval timing, this framework predicts that DA should increase the speed of the internal clock and decrease the extent of interference by other temporal stimuli during temporal reproduction (the central tendency effect). Finally, rational inattention makes the new predictions that these effects should be critically dependent on the controllability of rewards, that post-reward delays in intertemporal choice tasks should be underestimated, and that average reward manipulations should affect the speed of the clock-thus capturing empirical findings that are unexplained by either theory alone. Our results suggest that a common computational repertoire may underlie the seemingly heterogeneous roles of DA.


Assuntos
Atenção/fisiologia , Dopamina , Modelos Neurológicos , Teorema de Bayes , Cognição/fisiologia , Biologia Computacional , Dopamina/metabolismo , Dopamina/fisiologia , Humanos , Reforço Psicológico
15.
PLoS Comput Biol ; 17(2): e1008553, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33566831

RESUMO

Pavlovian associations drive approach towards reward-predictive cues, and avoidance of punishment-predictive cues. These associations "misbehave" when they conflict with correct instrumental behavior. This raises the question of how Pavlovian and instrumental influences on behavior are arbitrated. We test a computational theory according to which Pavlovian influence will be stronger when inferred controllability of outcomes is low. Using a model-based analysis of a Go/NoGo task with human subjects, we show that theta-band oscillatory power in frontal cortex tracks inferred controllability, and that these inferences predict Pavlovian action biases. Functional MRI data revealed an inferior frontal gyrus correlate of action probability and a ventromedial prefrontal correlate of outcome valence, both of which were modulated by inferred controllability.


Assuntos
Condicionamento Operante , Eletroencefalografia/métodos , Imageamento por Ressonância Magnética/métodos , Adolescente , Adulto , Teorema de Bayes , Simulação por Computador , Tomada de Decisões , Lobo Frontal , Humanos , Modelos Neurológicos , Neuroimagem/métodos , Córtex Pré-Frontal/fisiologia , Punição , Recompensa , Adulto Jovem
16.
Cogn Psychol ; 138: 101509, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36152355

RESUMO

Understanding the inductive biases that allow humans to learn in complex environments has been an important goal of cognitive science. Yet, while we have discovered much about human biases in specific learning domains, much of this research has focused on simple tasks that lack the complexity of the real world. In contrast, video games involving agents and objects embedded in richly structured systems provide an experimentally tractable proxy for real-world complexity. Recent work has suggested that key aspects of human learning in domains like video games can be captured by model-based reinforcement learning (RL) with object-oriented relational models-what we term theory-based RL. Restricting the model class in this way provides an inductive bias that dramatically increases learning efficiency, but in this paper we show that humans employ a stronger set of biases in addition to syntactic constraints on the structure of theories. In particular, we catalog a set of semantic biases that constrain the content of theories. Building these semantic biases into a theory-based RL system produces more human-like learning in video game environments.


Assuntos
Reforço Psicológico , Jogos de Vídeo , Viés , Humanos , Aprendizagem , Semântica
17.
Proc Natl Acad Sci U S A ; 116(13): 6035-6044, 2019 03 26.
Artigo em Inglês | MEDLINE | ID: mdl-30862738

RESUMO

Evaluating stimuli along a good-bad dimension is a fundamental computation performed by the human mind. In recent decades, research has documented dissociations and associations between explicit (i.e., self-reported) and implicit (i.e., indirectly measured) forms of evaluations. However, it is unclear whether such dissociations arise from relatively more superficial differences in measurement techniques or from deeper differences in the processes by which explicit and implicit evaluations are acquired and represented. The present project (total N = 2,354) relies on the computationally well-specified distinction between model-based and model-free reinforcement learning to investigate the unique and shared aspects of explicit and implicit evaluations. Study 1 used a revaluation procedure to reveal that, whereas explicit evaluations of novel targets are updated via model-free and model-based processes, implicit evaluations depend on the former but are impervious to the latter. Studies 2 and 3 demonstrated the robustness of this effect to (i) the number of stimulus exposures in the revaluation phase and (ii) the deterministic vs. probabilistic nature of initial reinforcement. These findings provide a framework, going beyond traditional dual-process and single-process accounts, to highlight the context-sensitivity and long-term recalcitrance of implicit evaluations as well as variations in their relationship with their explicit counterparts. These results also suggest avenues for designing theoretically guided interventions to produce change in implicit evaluations.


Assuntos
Julgamento , Aprendizagem , Modelos Psicológicos , Adulto , Comportamento de Escolha , Humanos , Reforço Psicológico
18.
Proc Natl Acad Sci U S A ; 116(28): 13903-13908, 2019 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-31235598

RESUMO

Making good decisions requires people to appropriately explore their available options and generalize what they have learned. While computational models can explain exploratory behavior in constrained laboratory tasks, it is unclear to what extent these models generalize to real-world choice problems. We investigate the factors guiding exploratory behavior in a dataset consisting of 195,333 customers placing 1,613,967 orders from a large online food delivery service. We find important hallmarks of adaptive exploration and generalization, which we analyze using computational models. In particular, customers seem to engage in uncertainty-directed exploration and use feature-based generalization to guide their exploration. Our results provide evidence that people use sophisticated strategies to explore complex, real-world environments.


Assuntos
Comportamento de Escolha/fisiologia , Tomada de Decisões , Generalização Psicológica , Reforço Psicológico , Simulação por Computador , Comportamento do Consumidor , Tomada de Decisões/fisiologia , Comportamento Exploratório/fisiologia , Feminino , Humanos , Aprendizagem/fisiologia , Masculino , Incerteza
19.
Entropy (Basel) ; 24(12)2022 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-36554196

RESUMO

Neurons in the medial entorhinal cortex exhibit multiple, periodically organized, firing fields which collectively appear to form an internal representation of space. Neuroimaging data suggest that this grid coding is also present in other cortical areas such as the prefrontal cortex, indicating that it may be a general principle of neural functionality in the brain. In a recent analysis through the lens of dynamical systems theory, we showed how grid coding can lead to the generation of a diversity of empirically observed sequential reactivations of hippocampal place cells corresponding to traversals of cognitive maps. Here, we extend this sequence generation model by describing how the synthesis of multiple dynamical systems can support compositional cognitive computations. To empirically validate the model, we simulate two experiments demonstrating compositionality in space or in time during sequence generation. Finally, we describe several neural network architectures supporting various types of compositionality based on grid coding and highlight connections to recent work in machine learning leveraging analogous techniques.

20.
PLoS Comput Biol ; 16(4): e1007594, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32251444

RESUMO

We propose that humans spontaneously organize environments into clusters of states that support hierarchical planning, enabling them to tackle challenging problems by breaking them down into sub-problems at various levels of abstraction. People constantly rely on such hierarchical presentations to accomplish tasks big and small-from planning one's day, to organizing a wedding, to getting a PhD-often succeeding on the very first attempt. We formalize a Bayesian model of hierarchy discovery that explains how humans discover such useful abstractions. Building on principles developed in structure learning and robotics, the model predicts that hierarchy discovery should be sensitive to the topological structure, reward distribution, and distribution of tasks in the environment. In five simulations, we show that the model accounts for previously reported effects of environment structure on planning behavior, such as detection of bottleneck states and transitions. We then test the novel predictions of the model in eight behavioral experiments, demonstrating how the distribution of tasks and rewards can influence planning behavior via the discovered hierarchy, sometimes facilitating and sometimes hindering performance. We find evidence that the hierarchy discovery process unfolds incrementally across trials. Finally, we propose how hierarchy discovery and hierarchical planning might be implemented in the brain. Together, these findings present an important advance in our understanding of how the brain might use Bayesian inference to discover and exploit the hidden hierarchical structure of the environment.


Assuntos
Teorema de Bayes , Encéfalo/fisiologia , Aprendizagem/fisiologia , Algoritmos , Simulação por Computador , Feminino , Humanos , Masculino , Cadeias de Markov , Modelos Neurológicos , Método de Monte Carlo , Recompensa , Jogos de Vídeo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA