Your browser doesn't support javascript.
loading
Meta-reinforcement learning via orbitofrontal cortex.
Hattori, Ryoma; Hedrick, Nathan G; Jain, Anant; Chen, Shuqi; You, Hanjia; Hattori, Mariko; Choi, Jun-Hyeok; Lim, Byung Kook; Yasuda, Ryohei; Komiyama, Takaki.
Afiliación
  • Hattori R; Department of Neurobiology, University of California San Diego, La Jolla, CA, USA. rhattori0204@gmail.com.
  • Hedrick NG; Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA. rhattori0204@gmail.com.
  • Jain A; Department of Neurosciences, University of California San Diego, La Jolla, CA, USA. rhattori0204@gmail.com.
  • Chen S; Halicioglu Data Science Institute, University of California San Diego, La Jolla, CA, USA. rhattori0204@gmail.com.
  • You H; Department of Neuroscience, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, University of Florida, Jupiter, FL, USA. rhattori0204@gmail.com.
  • Hattori M; Department of Neurobiology, University of California San Diego, La Jolla, CA, USA.
  • Choi JH; Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA.
  • Lim BK; Department of Neurosciences, University of California San Diego, La Jolla, CA, USA.
  • Yasuda R; Halicioglu Data Science Institute, University of California San Diego, La Jolla, CA, USA.
  • Komiyama T; Max Planck Florida Institute for Neuroscience, Jupiter, FL, USA.
Nat Neurosci ; 26(12): 2182-2191, 2023 Dec.
Article en En | MEDLINE | ID: mdl-37957318
ABSTRACT
The meta-reinforcement learning (meta-RL) framework, which involves RL over multiple timescales, has been successful in training deep RL models that generalize to new environments. It has been hypothesized that the prefrontal cortex may mediate meta-RL in the brain, but the evidence is scarce. Here we show that the orbitofrontal cortex (OFC) mediates meta-RL. We trained mice and deep RL models on a probabilistic reversal learning task across sessions during which they improved their trial-by-trial RL policy through meta-learning. Ca2+/calmodulin-dependent protein kinase II-dependent synaptic plasticity in OFC was necessary for this meta-learning but not for the within-session trial-by-trial RL in experts. After meta-learning, OFC activity robustly encoded value signals, and OFC inactivation impaired the RL behaviors. Longitudinal tracking of OFC activity revealed that meta-learning gradually shapes population value coding to guide the ongoing behavioral policy. Our results indicate that two distinct RL algorithms with distinct neural mechanisms and timescales coexist in OFC to support adaptive decision-making.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Refuerzo en Psicología / Recompensa Límite: Animals Idioma: En Revista: Nat Neurosci Asunto de la revista: NEUROLOGIA Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Refuerzo en Psicología / Recompensa Límite: Animals Idioma: En Revista: Nat Neurosci Asunto de la revista: NEUROLOGIA Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos