Your browser doesn't support javascript.
loading
Mastering Atari, Go, chess and shogi by planning with a learned model.
Schrittwieser, Julian; Antonoglou, Ioannis; Hubert, Thomas; Simonyan, Karen; Sifre, Laurent; Schmitt, Simon; Guez, Arthur; Lockhart, Edward; Hassabis, Demis; Graepel, Thore; Lillicrap, Timothy; Silver, David.
Afiliação
  • Schrittwieser J; DeepMind, London, UK.
  • Antonoglou I; DeepMind, London, UK.
  • Hubert T; University College London, London, UK.
  • Simonyan K; DeepMind, London, UK.
  • Sifre L; DeepMind, London, UK.
  • Schmitt S; DeepMind, London, UK.
  • Guez A; DeepMind, London, UK.
  • Lockhart E; DeepMind, London, UK.
  • Hassabis D; DeepMind, London, UK.
  • Graepel T; DeepMind, London, UK.
  • Lillicrap T; DeepMind, London, UK.
  • Silver D; University College London, London, UK.
Nature ; 588(7839): 604-609, 2020 12.
Article em En | MEDLINE | ID: mdl-33361790
ABSTRACT
Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess1 and Go2, where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games3-the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled4-the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi-canonical environments for high-performance planning-the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm5 that was supplied with the rules of the game.

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Revista: Nature Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Reino Unido

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Revista: Nature Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Reino Unido