Pesquisa | BVS - MINISTÉRIO DA SAÚDE

MEOL: A Maximum-Entropy Framework for Options Learning.

Zhang, Pin; Dong, Wenhan; Cai, Ming; Jia, Shengde; Wang, Zi-Peng.

IEEE Trans Neural Netw Learn Syst ; PP2024 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-38507376

RESUMO

Options, the temporally extended courses of actions that can be taken at varying time scale, have provided a concrete, key framework for learning levels of temporal abstraction in hierarchical tasks. While methods of learning options end-to-end is well researched, how to explore good options and actions simultaneously is still challenging. We address this issue by maximizing reward augmented with entropies of both option and action selection policy in options learning. To this end, we reveal our novel optimization objective by reformulating options learning from perspective of probabilistic inference and propose a soft options iteration method to guarantee convergence to the optimum. In implementation, we propose an off-policy algorithm called the maximum-entropy options critic (MEOC) and evaluate it on series of continuous control benchmarks. Comparative results demonstrate that our method outperforms baselines in efficiency and final result on most benchmarks, and the performance exhibits superiority and robustness especially on complex tasks. Ablated studies further explain that entropy maximization on hierarchical exploration promotes learning performance through efficient options specialization and multimodality in action level.

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA