Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Mastering the game of Stratego with model-free multiagent reinforcement learning.

Perolat, Julien; De Vylder, Bart; Hennes, Daniel; Tarassov, Eugene; Strub, Florian; de Boer, Vincent; Muller, Paul; Connor, Jerome T; Burch, Neil; Anthony, Thomas; McAleer, Stephen; Elie, Romuald; Cen, Sarah H; Wang, Zhe; Gruslys, Audrunas; Malysheva, Aleksandra; Khan, Mina; Ozair, Sherjil; Timbers, Finbarr; Pohlen, Toby; Eccles, Tom; Rowland, Mark; Lanctot, Marc; Lespiau, Jean-Baptiste; Piot, Bilal; Omidshafiei, Shayegan; Lockhart, Edward; Sifre, Laurent; Beauguerlange, Nathalie; Munos, Remi; Silver, David; Singh, Satinder; Hassabis, Demis; Tuyls, Karl.

Science ; 378(6623): 990-996, 2022 12 02.

Artigo em Inglês | MEDLINE | ID: mdl-36454847

RESUMO

We introduce DeepNash, an autonomous agent that plays the imperfect information game Stratego at a human expert level. Stratego is one of the few iconic board games that artificial intelligence (AI) has not yet mastered. It is a game characterized by a twin challenge: It requires long-term strategic thinking as in chess, but it also requires dealing with imperfect information as in poker. The technique underpinning DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego through self-play from scratch. DeepNash beat existing state-of-the-art AI methods in Stratego and achieved a year-to-date (2022) and all-time top-three ranking on the Gravon games platform, competing with human expert players.

Assuntos

Inteligência Artificial , Reforço Psicológico , Jogos de Vídeo , Humanos

Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning.

Piot, Bilal; Geist, Matthieu; Pietquin, Olivier.

IEEE Trans Neural Netw Learn Syst ; 28(8): 1814-1826, 2017 08.

Artigo em Inglês | MEDLINE | ID: mdl-27164607

RESUMO

Learning from demonstrations is a paradigm by which an apprentice agent learns a control policy for a dynamic environment by observing demonstrations delivered by an expert agent. It is usually implemented as either imitation learning (IL) or inverse reinforcement learning (IRL) in the literature. On the one hand, IRL is a paradigm relying on the Markov decision processes, where the goal of the apprentice agent is to find a reward function from the expert demonstrations that could explain the expert behavior. On the other hand, IL consists in directly generalizing the expert strategy, observed in the demonstrations, to unvisited states (and it is therefore close to classification, when there is a finite set of possible decisions). While these two visions are often considered as opposite to each other, the purpose of this paper is to exhibit a formal link between these approaches from which new algorithms can be derived. We show that IL and IRL can be redefined in a way that they are equivalent, in the sense that there exists an explicit bijective operator (namely, the inverse optimal Bellman operator) between their respective spaces of solutions. To do so, we introduce the set-policy framework that creates a clear link between the IL and the IRL. As a result, the IL and IRL solutions making the best of both worlds are obtained. In addition, it is a unifying framework from which existing IL and IRL algorithms can be derived and which opens the way for the IL methods able to deal with the environment's dynamics. Finally, the IRL algorithms derived from the set-policy framework are compared with the algorithms belonging to the more common trajectory-matching family. Experiments demonstrate that the set-policy-based algorithms outperform both the standard IRL and IL ones and result in more robust solutions.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA