Your browser doesn't support javascript.
loading
Multi-timescale reinforcement learning in the brain.
Masset, Paul; Tano, Pablo; Kim, HyungGoo R; Malik, Athar N; Pouget, Alexandre; Uchida, Naoshige.
Afiliación
  • Masset P; Department of Molecular and Cellular Biology, Harvard University, USA.
  • Tano P; Center for Brain Science, Harvard University, USA.
  • Kim HR; Department of Basic Neuroscience, University of Geneva, Switzerland.
  • Malik AN; Department of Molecular and Cellular Biology, Harvard University, USA.
  • Pouget A; Center for Brain Science, Harvard University, USA.
  • Uchida N; Department of Biomedical Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea.
bioRxiv ; 2023 Nov 14.
Article en En | MEDLINE | ID: mdl-38014166
To thrive in complex environments, animals and artificial agents must learn to act adaptively to maximize fitness and rewards. Such adaptive behavior can be learned through reinforcement learning1, a class of algorithms that has been successful at training artificial agents2-6 and at characterizing the firing of dopamine neurons in the midbrain7-9. In classical reinforcement learning, agents discount future rewards exponentially according to a single time scale, controlled by the discount factor. Here, we explore the presence of multiple timescales in biological reinforcement learning. We first show that reinforcement agents learning at a multitude of timescales possess distinct computational benefits. Next, we report that dopamine neurons in mice performing two behavioral tasks encode reward prediction error with a diversity of discount time constants. Our model explains the heterogeneity of temporal discounting in both cue-evoked transient responses and slower timescale fluctuations known as dopamine ramps. Crucially, the measured discount factor of individual neurons is correlated across the two tasks suggesting that it is a cell-specific property. Together, our results provide a new paradigm to understand functional heterogeneity in dopamine neurons, a mechanistic basis for the empirical observation that humans and animals use non-exponential discounts in many situations10-14, and open new avenues for the design of more efficient reinforcement learning algorithms.

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos