Pesquisa | BVS IEC

Multiagent off-screen behavior prediction in football.

Omidshafiei, Shayegan; Hennes, Daniel; Garnelo, Marta; Wang, Zhe; Recasens, Adria; Tarassov, Eugene; Yang, Yi; Elie, Romuald; Connor, Jerome T; Muller, Paul; Mackraz, Natalie; Cao, Kris; Moreno, Pol; Sprechmann, Pablo; Hassabis, Demis; Graham, Ian; Spearman, William; Heess, Nicolas; Tuyls, Karl.

Sci Rep ; 12(1): 8638, 2022 05 23.

Artigo em Inglês | MEDLINE | ID: mdl-35606400

RESUMO

In multiagent worlds, several decision-making individuals interact while adhering to the dynamics constraints imposed by the environment. These interactions, combined with the potential stochasticity of the agents' dynamic behaviors, make such systems complex and interesting to study from a decision-making perspective. Significant research has been conducted on learning models for forward-direction estimation of agent behaviors, for example, pedestrian predictions used for collision-avoidance in self-driving cars. In many settings, only sporadic observations of agents may be available in a given trajectory sequence. In football, subsets of players may come in and out of view of broadcast video footage, while unobserved players continue to interact off-screen. In this paper, we study the problem of multiagent time-series imputation in the context of human football play, where available past and future observations of subsets of agents are used to estimate missing observations for other agents. Our approach, called the Graph Imputer, uses past and future information in combination with graph networks and variational autoencoders to enable learning of a distribution of imputed trajectories. We demonstrate our approach on multiagent settings involving players that are partially-observable, using the Graph Imputer to predict the behaviors of off-screen players. To quantitatively evaluate the approach, we conduct experiments on football matches with ground truth trajectory data, using a camera module to simulate the off-screen player state estimation setting. We subsequently use our approach for downstream football analytics under partial observability using the well-established framework of pitch control, which traditionally relies on fully observed data. We illustrate that our method outperforms several state-of-the-art approaches, including those hand-crafted for football, across all considered metrics.

Assuntos

Futebol Americano , Futebol , Humanos , Aprendizagem

From motor control to team play in simulated humanoid football.

Liu, Siqi; Lever, Guy; Wang, Zhe; Merel, Josh; Eslami, S M Ali; Hennes, Daniel; Czarnecki, Wojciech M; Tassa, Yuval; Omidshafiei, Shayegan; Abdolmaleki, Abbas; Siegel, Noah Y; Hasenclever, Leonard; Marris, Luke; Tunyasuvunakool, Saran; Song, H Francis; Wulfmeier, Markus; Muller, Paul; Haarnoja, Tuomas; Tracey, Brendan; Tuyls, Karl; Graepel, Thore; Heess, Nicolas.

Sci Robot ; 7(69): eabo0235, 2022 08 31.

Artigo em Inglês | MEDLINE | ID: mdl-36044556

RESUMO

Learning to combine control at the level of joint torques with longer-term goal-directed behavior is a long-standing challenge for physically embodied artificial agents. Intelligent behavior in the physical world unfolds across multiple spatial and temporal scales: Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals that are defined on much longer time scales and that often involve complex interactions with the environment and other agents. Recent research has demonstrated the potential of learning-based approaches applied to the respective problems of complex movement, long-term planning, and multiagent coordination. However, their integration traditionally required the design and optimization of independent subsystems and remains challenging. In this work, we tackled the integration of motor control and long-horizon decision-making in the context of simulated humanoid football, which requires agile motor control and multiagent coordination. We optimized teams of agents to play simulated football via reinforcement learning, constraining the solution space to that of plausible movements learned using human motion capture data. They were trained to maximize several environment rewards and to imitate pretrained football-specific skills if doing so led to improved performance. The result is a team of coordinated humanoid football players that exhibit complex behavior at different scales, quantified by a range of analysis and statistics, including those used in real-world sport analytics. Our work constitutes a complete demonstration of learned integrated decision-making at multiple scales in a multiagent setting.

Assuntos

Futebol Americano , Futebol , Humanos , Aprendizagem , Movimento , Reforço Psicológico , Futebol/fisiologia

Mastering the game of Stratego with model-free multiagent reinforcement learning.

Perolat, Julien; De Vylder, Bart; Hennes, Daniel; Tarassov, Eugene; Strub, Florian; de Boer, Vincent; Muller, Paul; Connor, Jerome T; Burch, Neil; Anthony, Thomas; McAleer, Stephen; Elie, Romuald; Cen, Sarah H; Wang, Zhe; Gruslys, Audrunas; Malysheva, Aleksandra; Khan, Mina; Ozair, Sherjil; Timbers, Finbarr; Pohlen, Toby; Eccles, Tom; Rowland, Mark; Lanctot, Marc; Lespiau, Jean-Baptiste; Piot, Bilal; Omidshafiei, Shayegan; Lockhart, Edward; Sifre, Laurent; Beauguerlange, Nathalie; Munos, Remi; Silver, David; Singh, Satinder; Hassabis, Demis; Tuyls, Karl.

Science ; 378(6623): 990-996, 2022 12 02.

Artigo em Inglês | MEDLINE | ID: mdl-36454847

RESUMO

We introduce DeepNash, an autonomous agent that plays the imperfect information game Stratego at a human expert level. Stratego is one of the few iconic board games that artificial intelligence (AI) has not yet mastered. It is a game characterized by a twin challenge: It requires long-term strategic thinking as in chess, but it also requires dealing with imperfect information as in poker. The technique underpinning DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego through self-play from scratch. DeepNash beat existing state-of-the-art AI methods in Stratego and achieved a year-to-date (2022) and all-time top-three ranking on the Gravon games platform, competing with human expert players.

Assuntos

Inteligência Artificial , Reforço Psicológico , Jogos de Vídeo , Humanos

Navigating the landscape of multiplayer games.

Omidshafiei, Shayegan; Tuyls, Karl; Czarnecki, Wojciech M; Santos, Francisco C; Rowland, Mark; Connor, Jerome; Hennes, Daniel; Muller, Paul; Pérolat, Julien; Vylder, Bart De; Gruslys, Audrunas; Munos, Rémi.

Nat Commun ; 11(1): 5603, 2020 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-33154362

RESUMO

Multiplayer games have long been used as testbeds in artificial intelligence research, aptly referred to as the Drosophila of artificial intelligence. Traditionally, researchers have focused on using well-known games to build strong agents. This progress, however, can be better informed by characterizing games and their topological landscape. Tackling this latter question can facilitate understanding of agents and help determine what game an agent should target next as part of its training. Here, we show how network measures applied to response graphs of large-scale games enable the creation of a landscape of games, quantifying relationships between games of varying sizes and characteristics. We illustrate our findings in domains ranging from canonical games to complex empirical games capturing the performance of trained agents pitted against one another. Our results culminate in a demonstration leveraging this information to generate new and interesting games, including mixtures of empirical games synthesized from real world games.

α-Rank: Multi-Agent Evaluation by Evolution.

Omidshafiei, Shayegan; Papadimitriou, Christos; Piliouras, Georgios; Tuyls, Karl; Rowland, Mark; Lespiau, Jean-Baptiste; Czarnecki, Wojciech M; Lanctot, Marc; Perolat, Julien; Munos, Remi.

Sci Rep ; 9(1): 9937, 2019 07 09.

Artigo em Inglês | MEDLINE | ID: mdl-31289288

RESUMO

We introduce α-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs). The approach leverages continuous-time and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of agents, in the type of interactions (beyond dyadic), and the type of empirical games (symmetric and asymmetric). Current models are fundamentally limited in one or more of these dimensions, and are not guaranteed to converge to the desired game-theoretic solution concept (typically the Nash equilibrium). α-Rank automatically provides a ranking over the set of agents under evaluation and provides insights into their strengths, weaknesses, and long-term dynamics in terms of basins of attraction and sink components. This is a direct consequence of the correspondence we establish to the dynamical MCC solution concept when the underlying evolutionary model's ranking-intensity parameter, α, is chosen to be large, which exactly forms the basis of α-Rank. In contrast to the Nash equilibrium, which is a static solution concept based solely on fixed points, MCCs are a dynamical solution concept based on the Markov chain formalism, Conley's Fundamental Theorem of Dynamical Systems, and the core ingredients of dynamical systems: fixed points, recurrent sets, periodic orbits, and limit cycles. Our α-Rank method runs in polynomial time with respect to the total number of pure strategy profiles, whereas computing a Nash equilibrium for a general-sum game is known to be intractable. We introduce mathematical proofs that not only provide an overarching and unifying perspective of existing continuous- and discrete-time evolutionary evaluation models, but also reveal the formal underpinnings of the α-Rank methodology. We illustrate the method in canonical games and empirically validate it in several domains, including AlphaGo, AlphaZero, MuJoCo Soccer, and Poker.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA