Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 1 de 1
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38507376

RESUMO

Options, the temporally extended courses of actions that can be taken at varying time scale, have provided a concrete, key framework for learning levels of temporal abstraction in hierarchical tasks. While methods of learning options end-to-end is well researched, how to explore good options and actions simultaneously is still challenging. We address this issue by maximizing reward augmented with entropies of both option and action selection policy in options learning. To this end, we reveal our novel optimization objective by reformulating options learning from perspective of probabilistic inference and propose a soft options iteration method to guarantee convergence to the optimum. In implementation, we propose an off-policy algorithm called the maximum-entropy options critic (MEOC) and evaluate it on series of continuous control benchmarks. Comparative results demonstrate that our method outperforms baselines in efficiency and final result on most benchmarks, and the performance exhibits superiority and robustness especially on complex tasks. Ablated studies further explain that entropy maximization on hierarchical exploration promotes learning performance through efficient options specialization and multimodality in action level.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA