RESUMEN
Learning to combine control at the level of joint torques with longer-term goal-directed behavior is a long-standing challenge for physically embodied artificial agents. Intelligent behavior in the physical world unfolds across multiple spatial and temporal scales: Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals that are defined on much longer time scales and that often involve complex interactions with the environment and other agents. Recent research has demonstrated the potential of learning-based approaches applied to the respective problems of complex movement, long-term planning, and multiagent coordination. However, their integration traditionally required the design and optimization of independent subsystems and remains challenging. In this work, we tackled the integration of motor control and long-horizon decision-making in the context of simulated humanoid football, which requires agile motor control and multiagent coordination. We optimized teams of agents to play simulated football via reinforcement learning, constraining the solution space to that of plausible movements learned using human motion capture data. They were trained to maximize several environment rewards and to imitate pretrained football-specific skills if doing so led to improved performance. The result is a team of coordinated humanoid football players that exhibit complex behavior at different scales, quantified by a range of analysis and statistics, including those used in real-world sport analytics. Our work constitutes a complete demonstration of learned integrated decision-making at multiple scales in a multiagent setting.
Asunto(s)
Fútbol Americano , Fútbol , Humanos , Aprendizaje , Movimiento , Refuerzo en Psicología , Fútbol/fisiologíaRESUMEN
Nuclear fusion using magnetic confinement, in particular in the tokamak configuration, is a promising path towards sustainable energy. A core challenge is to shape and maintain a high-temperature plasma within the tokamak vessel. This requires high-dimensional, high-frequency, closed-loop control using magnetic actuator coils, further complicated by the diverse requirements across a wide range of plasma configurations. In this work, we introduce a previously undescribed architecture for tokamak magnetic controller design that autonomously learns to command the full set of control coils. This architecture meets control objectives specified at a high level, at the same time satisfying physical and operational constraints. This approach has unprecedented flexibility and generality in problem specification and yields a notable reduction in design effort to produce new plasma configurations. We successfully produce and control a diverse set of plasma configurations on the Tokamak à Configuration Variable1,2, including elongated, conventional shapes, as well as advanced configurations, such as negative triangularity and 'snowflake' configurations. Our approach achieves accurate tracking of the location, current and shape for these configurations. We also demonstrate sustained 'droplets' on TCV, in which two separate plasmas are maintained simultaneously within the vessel. This represents a notable advance for tokamak feedback control, showing the potential of reinforcement learning to accelerate research in the fusion domain, and is one of the most challenging real-world systems to which reinforcement learning has been applied.
RESUMEN
Throughout the Holocene, societies developed additional layers of administration and more information-rich instruments for managing and recording transactions and events as they grew in population and territory. Yet, while such increases seem inevitable, they are not. Here we use the Seshat database to investigate the development of hundreds of polities, from multiple continents, over thousands of years. We find that sociopolitical development is dominated first by growth in polity scale, then by improvements in information processing and economic systems, and then by further increases in scale. We thus define a Scale Threshold for societies, beyond which growth in information processing becomes paramount, and an Information Threshold, which once crossed facilitates additional growth in scale. Polities diverge in socio-political features below the Information Threshold, but reconverge beyond it. We suggest an explanation for the evolutionary divergence between Old and New World polities based on phased growth in scale and information processing. We also suggest a mechanism to help explain social collapses with no evident external causes.
RESUMEN
Pathogens can spread epidemically through populations. Beneficial contagions, such as viruses that enhance host survival or technological innovations that improve quality of life, also have the potential to spread epidemically. How do the dynamics of beneficial biological and social epidemics differ from those of detrimental epidemics? We investigate this question using a breadth-first modeling approach involving three distinct theoretical models. First, in the context of population genetics, we show that a horizontally-transmissible element that increases fitness, such as viral DNA, spreads superexponentially through a population, more quickly than a beneficial mutation. Second, in the context of behavioral epidemiology, we show that infections that cause increased connectivity lead to superexponential fixation in the population. Third, in the context of dynamic social networks, we find that preferences for increased global infection accelerate spread and produce superexponential fixation, but preferences for local assortativity halt epidemics by disconnecting the infected from the susceptible. We conclude that the dynamics of beneficial biological and social epidemics are characterized by the rapid spread of beneficial elements, which is facilitated in biological systems by horizontal transmission and in social systems by active spreading behavior of infected individuals.