Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Negotiation and honesty in artificial intelligence methods for the board game of Diplomacy.

Kramár, János; Eccles, Tom; Gemp, Ian; Tacchetti, Andrea; McKee, Kevin R; Malinowski, Mateusz; Graepel, Thore; Bachrach, Yoram.

Nat Commun ; 13(1): 7214, 2022 12 06.

Artigo em Inglês | MEDLINE | ID: mdl-36473833

RESUMO

The success of human civilization is rooted in our ability to cooperate by communicating and making joint plans. We study how artificial agents may use communication to better cooperate in Diplomacy, a long-standing AI challenge. We propose negotiation algorithms allowing agents to agree on contracts regarding joint plans, and show they outperform agents lacking this ability. For humans, misleading others about our intentions forms a barrier to cooperation. Diplomacy requires reasoning about our opponents' future plans, enabling us to study broken commitments between agents and the conditions for honest cooperation. We find that artificial agents face a similar problem as humans: communities of communicating agents are susceptible to peers who deviate from agreements. To defend against this, we show that the inclination to sanction peers who break contracts dramatically reduces the advantage of such deviators. Hence, sanctioning helps foster mostly truthful communication, despite conditions that initially favor deviations from agreements.

Assuntos

Inteligência Artificial , Humanos

2.

From motor control to team play in simulated humanoid football.

Liu, Siqi; Lever, Guy; Wang, Zhe; Merel, Josh; Eslami, S M Ali; Hennes, Daniel; Czarnecki, Wojciech M; Tassa, Yuval; Omidshafiei, Shayegan; Abdolmaleki, Abbas; Siegel, Noah Y; Hasenclever, Leonard; Marris, Luke; Tunyasuvunakool, Saran; Song, H Francis; Wulfmeier, Markus; Muller, Paul; Haarnoja, Tuomas; Tracey, Brendan; Tuyls, Karl; Graepel, Thore; Heess, Nicolas.

Sci Robot ; 7(69): eabo0235, 2022 08 31.

Artigo em Inglês | MEDLINE | ID: mdl-36044556

RESUMO

Learning to combine control at the level of joint torques with longer-term goal-directed behavior is a long-standing challenge for physically embodied artificial agents. Intelligent behavior in the physical world unfolds across multiple spatial and temporal scales: Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals that are defined on much longer time scales and that often involve complex interactions with the environment and other agents. Recent research has demonstrated the potential of learning-based approaches applied to the respective problems of complex movement, long-term planning, and multiagent coordination. However, their integration traditionally required the design and optimization of independent subsystems and remains challenging. In this work, we tackled the integration of motor control and long-horizon decision-making in the context of simulated humanoid football, which requires agile motor control and multiagent coordination. We optimized teams of agents to play simulated football via reinforcement learning, constraining the solution space to that of plausible movements learned using human motion capture data. They were trained to maximize several environment rewards and to imitate pretrained football-specific skills if doing so led to improved performance. The result is a team of coordinated humanoid football players that exhibit complex behavior at different scales, quantified by a range of analysis and statistics, including those used in real-world sport analytics. Our work constitutes a complete demonstration of learned integrated decision-making at multiple scales in a multiagent setting.

Assuntos

Futebol Americano , Futebol , Humanos , Aprendizagem , Movimento , Reforço Psicológico , Futebol/fisiologia

3.

Cooperative AI: machines must learn to find common ground.

Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore.

Nature ; 593(7857): 33-36, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-33947992

Assuntos

Inteligência Artificial/tendências , Comunicação , Comportamento Cooperativo , Ciências Sociais , Criança , Comportamento Competitivo , Diplomacia , Empatia , Jogos Experimentais , Humanos , Lactente , Cooperação Internacional , Processamento de Linguagem Natural , Política

4.

Mastering Atari, Go, chess and shogi by planning with a learned model.

Schrittwieser, Julian; Antonoglou, Ioannis; Hubert, Thomas; Simonyan, Karen; Sifre, Laurent; Schmitt, Simon; Guez, Arthur; Lockhart, Edward; Hassabis, Demis; Graepel, Thore; Lillicrap, Timothy; Silver, David.

Nature ; 588(7839): 604-609, 2020 12.

Artigo em Inglês | MEDLINE | ID: mdl-33361790

RESUMO

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess1 and Go2, where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games3-the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled4-the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi-canonical environments for high-performance planning-the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm5 that was supplied with the rules of the game.

5.

Human-level performance in 3D multiplayer games with population-based reinforcement learning.

Jaderberg, Max; Czarnecki, Wojciech M; Dunning, Iain; Marris, Luke; Lever, Guy; Castañeda, Antonio Garcia; Beattie, Charles; Rabinowitz, Neil C; Morcos, Ari S; Ruderman, Avraham; Sonnerat, Nicolas; Green, Tim; Deason, Louise; Leibo, Joel Z; Silver, David; Hassabis, Demis; Kavukcuoglu, Koray; Graepel, Thore.

Science ; 364(6443): 859-865, 2019 May 31.

Artigo em Inglês | MEDLINE | ID: mdl-31147514

RESUMO

Reinforcement learning (RL) has shown great success in increasingly complex single-agent environments and two-player turn-based games. However, the real world contains multiple agents, each learning and acting independently to cooperate and compete with other agents. We used a tournament-style evaluation to demonstrate that an agent can achieve human-level performance in a three-dimensional multiplayer first-person video game, Quake III Arena in Capture the Flag mode, using only pixels and game points scored as input. We used a two-tier optimization process in which a population of independent RL agents are trained concurrently from thousands of parallel matches on randomly generated environments. Each agent learns its own internal reward signal and rich representation of the world. These results indicate the great potential of multiagent reinforcement learning for artificial intelligence research.

Assuntos

Aprendizado de Máquina , Reforço Psicológico , Jogos de Vídeo , Recompensa

6.

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.

Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis.

Science ; 362(6419): 1140-1144, 2018 12 07.

Artigo em Inglês | MEDLINE | ID: mdl-30523106

RESUMO

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

Assuntos

Inteligência Artificial , Reforço Psicológico , Jogos de Vídeo , Algoritmos , Humanos , Software

7.

Symmetric Decomposition of Asymmetric Games.

Tuyls, Karl; Pérolat, Julien; Lanctot, Marc; Ostrovski, Georg; Savani, Rahul; Leibo, Joel Z; Ord, Toby; Graepel, Thore; Legg, Shane.

Sci Rep ; 8(1): 1015, 2018 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-29343692

RESUMO

We introduce new theoretical insights into two-population asymmetric games allowing for an elegant symmetric decomposition into two single population symmetric games. Specifically, we show how an asymmetric bimatrix game (A,B) can be decomposed into its symmetric counterparts by envisioning and investigating the payoff tables (A and B) that constitute the asymmetric game, as two independent, single population, symmetric games. We reveal several surprising formal relationships between an asymmetric two-population game and its symmetric single population counterparts, which facilitate a convenient analysis of the original asymmetric game due to the dimensionality reduction of the decomposition. The main finding reveals that if (x,y) is a Nash equilibrium of an asymmetric game (A,B), this implies that y is a Nash equilibrium of the symmetric counterpart game determined by payoff table A, and x is a Nash equilibrium of the symmetric counterpart game determined by payoff table B. Also the reverse holds and combinations of Nash equilibria of the counterpart games form Nash equilibria of the asymmetric game. We illustrate how these formal relationships aid in identifying and analysing the Nash structure of asymmetric games, by examining the evolutionary dynamics of the simpler counterpart games in several canonical examples.

Assuntos

Jogos Experimentais , Modelos Estatísticos , Feminino , Teoria dos Jogos , Humanos , Masculino

8.

Mastering the game of Go without human knowledge.

Silver, David; Schrittwieser, Julian; Simonyan, Karen; Antonoglou, Ioannis; Huang, Aja; Guez, Arthur; Hubert, Thomas; Baker, Lucas; Lai, Matthew; Bolton, Adrian; Chen, Yutian; Lillicrap, Timothy; Hui, Fan; Sifre, Laurent; van den Driessche, George; Graepel, Thore; Hassabis, Demis.

Nature ; 550(7676): 354-359, 2017 10 18.

Artigo em Inglês | MEDLINE | ID: mdl-29052630

RESUMO

A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo's own move selections and also the winner of AlphaGo's games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100-0 against the previously published, champion-defeating AlphaGo.

Assuntos

Jogos Recreativos , Software , Aprendizado de Máquina não Supervisionado , Humanos , Redes Neurais de Computação , Reforço Psicológico , Aprendizado de Máquina Supervisionado

9.

Mastering the game of Go with deep neural networks and tree search.

Silver, David; Huang, Aja; Maddison, Chris J; Guez, Arthur; Sifre, Laurent; van den Driessche, George; Schrittwieser, Julian; Antonoglou, Ioannis; Panneershelvam, Veda; Lanctot, Marc; Dieleman, Sander; Grewe, Dominik; Nham, John; Kalchbrenner, Nal; Sutskever, Ilya; Lillicrap, Timothy; Leach, Madeleine; Kavukcuoglu, Koray; Graepel, Thore; Hassabis, Demis.

Nature ; 529(7587): 484-9, 2016 Jan 28.

Artigo em Inglês | MEDLINE | ID: mdl-26819042

RESUMO

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses 'value networks' to evaluate board positions and 'policy networks' to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

Assuntos

Jogos Recreativos , Redes Neurais de Computação , Software , Aprendizado de Máquina Supervisionado , Computadores , Europa (Continente) , Humanos , Método de Monte Carlo , Reforço Psicológico

10.

Private traits and attributes are predictable from digital records of human behavior.

Kosinski, Michal; Stillwell, David; Graepel, Thore.

Proc Natl Acad Sci U S A ; 110(15): 5802-5, 2013 Apr 09.

Artigo em Inglês | MEDLINE | ID: mdl-23479631

RESUMO

We show that easily accessible digital records of behavior, Facebook Likes, can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. The analysis presented is based on a dataset of over 58,000 volunteers who provided their Facebook Likes, detailed demographic profiles, and the results of several psychometric tests. The proposed model uses dimensionality reduction for preprocessing the Likes data, which are then entered into logistic/linear regression to predict individual psychodemographic profiles from Likes. The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases. For the personality trait "Openness," prediction accuracy is close to the test-retest accuracy of a standard personality test. We give examples of associations between attributes and Likes and discuss implications for online personalization and privacy.

Assuntos

Comportamento , Apoio Social , Inteligência Artificial , Mineração de Dados/métodos , Emoções , Feminino , Heterossexualidade , Homossexualidade , Humanos , Masculino , Modelos Teóricos , Personalidade , Inventário de Personalidade , Política , Psicometria , Análise de Regressão , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA