Multi-timescale reinforcement learning in the brain.

Masset, Paul; Tano, Pablo; Kim, HyungGoo R; Malik, Athar N; Pouget, Alexandre; Uchida, Naoshige

Masset, Paul; Tano, Pablo; Kim, HyungGoo R; Malik, Athar N; Pouget, Alexandre; Uchida, Naoshige.

Afiliación

Masset P; Department of Molecular and Cellular Biology, Harvard University, USA.
Tano P; Center for Brain Science, Harvard University, USA.
Kim HR; Department of Basic Neuroscience, University of Geneva, Switzerland.
Malik AN; Department of Molecular and Cellular Biology, Harvard University, USA.
Pouget A; Center for Brain Science, Harvard University, USA.
Uchida N; Department of Biomedical Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea.

bioRxiv ; 2023 Nov 14.

Article en En | MEDLINE | ID: mdl-38014166

RESUMEN

To thrive in complex environments, animals and artificial agents must learn to act adaptively to maximize fitness and rewards. Such adaptive behavior can be learned through reinforcement learning1, a class of algorithms that has been successful at training artificial agents2-6 and at characterizing the firing of dopamine neurons in the midbrain7-9. In classical reinforcement learning, agents discount future rewards exponentially according to a single time scale, controlled by the discount factor. Here, we explore the presence of multiple timescales in biological reinforcement learning. We first show that reinforcement agents learning at a multitude of timescales possess distinct computational benefits. Next, we report that dopamine neurons in mice performing two behavioral tasks encode reward prediction error with a diversity of discount time constants. Our model explains the heterogeneity of temporal discounting in both cue-evoked transient responses and slower timescale fluctuations known as dopamine ramps. Crucially, the measured discount factor of individual neurons is correlated across the two tasks suggesting that it is a cell-specific property. Together, our results provide a new paradigm to understand functional heterogeneity in dopamine neurons, a mechanistic basis for the empirical observation that humans and animals use non-exponential discounts in many situations10-14, and open new avenues for the design of more efficient reinforcement learning algorithms.

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google