Goal-oriented inference of environment from redundant observations.

Takahashi, Kazuki; Fukai, Tomoki; Sakai, Yutaka; Takekawa, Takashi

Takahashi, Kazuki; Fukai, Tomoki; Sakai, Yutaka; Takekawa, Takashi.

Afiliação

Takahashi K; Informatics Program, Graduate School of Engineering, Kogakuin University of Technology and Engineering, Japan.
Fukai T; Neural Coding and Brain Computing Unit, Okinawa Institute of Science and Technology, Japan.
Sakai Y; Brain Science Institute, Tamagawa University, Japan.
Takekawa T; Informatics Program, Graduate School of Engineering, Kogakuin University of Technology and Engineering, Japan. Electronic address: takekawa@cc.kogakuin.ac.jp.

Neural Netw ; 174: 106246, 2024 Jun.

Article em En | MEDLINE | ID: mdl-38547801

ABSTRACT

ABSTRACT

The agent learns to organize decision behavior to achieve a behavioral goal, such as reward maximization, and reinforcement learning is often used for this optimization. Learning an optimal behavioral strategy is difficult under the uncertainty that events necessary for learning are only partially observable, called as Partially Observable Markov Decision Process (POMDP). However, the real-world environment also gives many events irrelevant to reward delivery and an optimal behavioral strategy. The conventional methods in POMDP, which attempt to infer transition rules among the entire observations, including irrelevant states, are ineffective in such an environment. Supposing Redundantly Observable Markov Decision Process (ROMDP), here we propose a method for goal-oriented reinforcement learning to efficiently learn state transition rules among reward-related "core states" from redundant observations. Starting with a small number of initial core states, our model gradually adds new core states to the transition diagram until it achieves an optimal behavioral strategy consistent with the Bellman equation. We demonstrate that the resultant inference model outperforms the conventional method for POMDP. We emphasize that our model only containing the core states has high explainability. Furthermore, the proposed method suits online learning as it suppresses memory consumption and improves learning speed.

Assuntos

Objetivos; Aprendizagem; Reforço Psicológico; Recompensa; Cadeias de Markov

Palavras-chave

Nonstationarity; Reinforcement learning; State abstraction; Variational inference

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Contexto em Saúde: 1_ASSA2030 Base de dados: MEDLINE Assunto principal: Objetivos / Aprendizagem Idioma: En Revista: Neural Netw Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google