A distributional code for value in dopamine-based reinforcement learning.

Dabney, Will; Kurth-Nelson, Zeb; Uchida, Naoshige; Starkweather, Clara Kwon; Hassabis, Demis; Munos, Rémi; Botvinick, Matthew

Dabney, Will; Kurth-Nelson, Zeb; Uchida, Naoshige; Starkweather, Clara Kwon; Hassabis, Demis; Munos, Rémi; Botvinick, Matthew.

Afiliação

Dabney W; DeepMind, London, UK. wdabney@google.com.
Kurth-Nelson Z; DeepMind, London, UK.
Uchida N; Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK.
Starkweather CK; Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA.
Hassabis D; Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA.
Munos R; DeepMind, London, UK.
Botvinick M; DeepMind, London, UK.

Nature ; 577(7792): 671-675, 2020 01.

Article em En | MEDLINE | ID: mdl-31942076

RESUMO

Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain1-3. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning4-6. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.

Assuntos

Dopamina/metabolismo; Aprendizagem/fisiologia; Modelos Neurológicos; Reforço Psicológico; Recompensa; Animais; Inteligência Artificial; Neurônios Dopaminérgicos/metabolismo; Neurônios GABAérgicos/metabolismo; Camundongos; Otimismo; Pessimismo; Probabilidade; Distribuições Estatísticas; Área Tegmentar Ventral/citologia; Área Tegmentar Ventral/fisiologia

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Reforço Psicológico / Recompensa / Dopamina / Aprendizagem / Modelos Neurológicos Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google