Mastering Atari, Go, chess and shogi by planning with a learned model.

Schrittwieser, Julian; Antonoglou, Ioannis; Hubert, Thomas; Simonyan, Karen; Sifre, Laurent; Schmitt, Simon; Guez, Arthur; Lockhart, Edward; Hassabis, Demis; Graepel, Thore; Lillicrap, Timothy; Silver, David

Schrittwieser, Julian; Antonoglou, Ioannis; Hubert, Thomas; Simonyan, Karen; Sifre, Laurent; Schmitt, Simon; Guez, Arthur; Lockhart, Edward; Hassabis, Demis; Graepel, Thore; Lillicrap, Timothy; Silver, David.

Afiliação

Schrittwieser J; DeepMind, London, UK.
Antonoglou I; DeepMind, London, UK.
Hubert T; University College London, London, UK.
Simonyan K; DeepMind, London, UK.
Sifre L; DeepMind, London, UK.
Schmitt S; DeepMind, London, UK.
Guez A; DeepMind, London, UK.
Lockhart E; DeepMind, London, UK.
Hassabis D; DeepMind, London, UK.
Graepel T; DeepMind, London, UK.
Lillicrap T; DeepMind, London, UK.
Silver D; University College London, London, UK.

Nature ; 588(7839): 604-609, 2020 12.

Article em En | MEDLINE | ID: mdl-33361790

ABSTRACT

ABSTRACT

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess1 and Go2, where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games3-the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled4-the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi-canonical environments for high-performance planning-the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm5 that was supplied with the rules of the game.

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Revista: Nature Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Reino Unido

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google