Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 674
Filtrar
1.
Sci Rep ; 14(1): 23093, 2024 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-39367072

RESUMO

This paper studies the consensus problem for a class of unknown heterogeneous nonlinear multi-agent systems via a network with random packet dropouts. Based on the dynamic linearization technique, novel model-free adaptive consensus protocols with the data compensation mechanism are designed for both leaderless and leader-following cases. The advantage of this approach is that only neighborhood input and output data of the agents are required in the protocol design. For the stability analysis, a new Squeeze Theorem based method is developed to derive the theoretic results instead of the traditional contraction mapping principle used in model-free adaptive control. It is shown that the consensus can be achieved for both leaderless and leader-following cases if the communication topology is strongly connected. Finally, numerical simulations verifying the correctness of the theoretical results are given.

2.
Sensors (Basel) ; 24(17)2024 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-39275469

RESUMO

Mobile Edge Computing (MEC) is crucial for reducing latency by bringing computational resources closer to the network edge, thereby enhancing the quality of services (QoS). However, the broad deployment of cloudlets poses challenges in efficient network slicing, particularly when traffic distribution is uneven. Therefore, these challenges include managing diverse resource requirements across widely distributed cloudlets, minimizing resource conflicts and delays, and maintaining service quality amid fluctuating request rates. Addressing this requires intelligent strategies to predict request types (common or urgent), assess resource needs, and allocate resources efficiently. Emerging technologies like edge computing and 5G with network slicing can handle delay-sensitive IoT requests rapidly, but a robust mechanism for real-time resource and utility optimization remains necessary. To address these challenges, we designed an end-to-end network slicing approach that predicts common and urgent user requests through T distribution. We formulated our problem as a multi-agent Markov decision process (MDP) and introduced a multi-agent soft actor-critic (MAgSAC) algorithm. This algorithm prevents the wastage of scarce resources by intelligently activating and deactivating virtual network function (VNF) instances, thereby balancing the allocation process. Our approach aims to optimize overall utility, balancing trade-offs between revenue, energy consumption costs, and latency. We evaluated our method, MAgSAC, through simulations, comparing it with the following six benchmark schemes: MAA3C, SACT, DDPG, S2Vec, Random, and Greedy. The results demonstrate that our approach, MAgSAC, optimizes utility by 30%, minimizes energy consumption costs by 12.4%, and reduces execution time by 21.7% compared to the closest related multi-agent approach named MAA3C.

3.
J Imaging Inform Med ; 2024 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-39249582

RESUMO

PelviNet introduces a groundbreaking multi-agent convolutional network architecture tailored for enhancing pelvic image registration. This innovative framework leverages shared convolutional layers, enabling synchronized learning among agents and ensuring an exhaustive analysis of intricate 3D pelvic structures. The architecture combines max pooling, parametric ReLU activations, and agent-specific layers to optimize both individual and collective decision-making processes. A communication mechanism efficiently aggregates outputs from these shared layers, enabling agents to make well-informed decisions by harnessing combined intelligence. PelviNet's evaluation centers on both quantitative accuracy metrics and visual representations to elucidate agents' performance in pinpointing optimal landmarks. Empirical results demonstrate PelviNet's superiority over traditional methods, achieving an average image-wise error of 2.8 mm, a subject-wise error of 3.2 mm, and a mean Euclidean distance error of 3.0 mm. These quantitative results highlight the model's efficiency and precision in landmark identification, crucial for medical contexts such as radiation therapy, where exact landmark identification significantly influences treatment outcomes. By reliably identifying critical structures, PelviNet advances pelvic image analysis and offers potential enhancements for broader medical imaging applications, marking a significant step forward in computational healthcare.

4.
Accid Anal Prev ; 208: 107789, 2024 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-39299179

RESUMO

Several studies have developed pedestrian-vehicle interaction models. However, these studies failed to consider pedestrian distraction, which considerably influences the safety of these interactions. Utilizing data from two intersections in Vancouver, Canada, this research uses the Multi-agent Adversarial Inverse Reinforcement Learning (MA-AIRL) framework to make inferences about the behavioral dynamics of distracted and non-distracted pedestrians while interacting with vehicles. Results showed that distracted pedestrians maintained closer proximity to vehicles, moved at reduced speeds, and rarely yielded to oncoming vehicles. In addition, they rarely changed their interaction angles regardless of lateral proximity to vehicles, indicating that they mostly remain unaware of the surrounding environment and have decreased navigational efficiency. Conversely, non-distracted pedestrians executed safer maneuvers, kept greater distances from vehicles, yielded more frequently, and adjusted their speeds accordingly. For example, non-distracted pedestrian-vehicle interactions showed a 46.5% decrease in traffic conflicts severity (as measured by the average Time-to-Collision (TTC) values) and an average 30.2% increase in minimum distances when compared to distracted pedestrian-vehicle interactions. Vehicle drivers also demonstrated different behaviors in response to distracted pedestrians. They often opted to decelerate around distracted pedestrians, indicating recognition of potential risks. Furthermore, the MA-AIRL framework provided different results depending on the type of interactions. The performance of the distracted vehicle-pedestrian model was lower than the non-distracted model, suggesting that predicting non-distracted behavior might be relatively easier. These findings emphasize the importance of refining pedestrian simulation models to include the unique behavioral patterns from pedestrian distractions. This should assist in further examining the safety impacts of pedestrian distraction on the road environment.

5.
ISA Trans ; : 1-14, 2024 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-39299846

RESUMO

This article studies the problem of formation tracking control in multi-agent systems, achieved in finite time, under challenging conditions such as strong nonlinearity, aperiodic intermittent communication, and time-delay effects, all within a hybrid impulsive framework. The impulses are categorized as either stabilizing control impulses or disruptive impulses. Furthermore, by integrating Lyapunov-based stability theory, graph theory, and the linear matrix inequality (LMI) method, new stability criteria are established. These criteria ensure finite-time intermittent formation tracking while considering weak Lyapunov inequality conditions, intermittent communication rates, and time-varying gain strengths. Additionally, the approach manages an indefinite number of impulsive moments and adjusts the control domain's width based on the average impulsive interval and state-dependent control width. Numerical simulations are provided to validate the applicability and effectiveness of the proposed formation tracking control protocols.

6.
ISA Trans ; : 1-15, 2024 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-39261266

RESUMO

Global Nash equilibrium is an optimal solution for each player in a graphical game. This paper proposes an iterative adaptive dynamic programming-based algorithm to solve the global Nash equilibrium solution for optimal containment control problem with robustness analysis to the iterative error. The containment control problem is transferred into the graphical game formulation. Sufficient conditions are given to decouple the Hamilton-Jacobi equations, which guarantee the solvability of the global Nash equilibrium solution. The iterative algorithm is designed to obtain the solution without any knowledge of system dynamics. Conditions of iterative error for global stability are given with rigorous proof. Compared with existing works, the design procedures of control gain and coupling strength are separated, which avoids trivial cases in the design procedure. The robustness analysis exactly quantifies the effect of the iterative error caused by various sources in engineering practice. The theoretical results are validated by two numerical examples with marginally stable and unstable dynamics of the leader.

7.
Sensors (Basel) ; 24(17)2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-39275567

RESUMO

The platooning of cars and trucks is a pertinent approach for autonomous driving due to the effective utilization of roadways. The decreased gas consumption levels are an added merit owing to sustainability. Conventional platooning depended on Dedicated Short-Range Communication (DSRC)-based vehicle-to-vehicle communications. The computations were executed by the platoon members with their constrained capabilities. The advent of 5G has favored Intelligent Transportation Systems (ITS) to adopt Multi-access Edge Computing (MEC) in platooning paradigms by offloading the computational tasks to the edge server. In this research, vital parameters in vehicular platooning systems, viz. latency-sensitive radio resource management schemes, and Age of Information (AoI) are investigated. In addition, the delivery rates of Cooperative Awareness Messages (CAM) that ensure expeditious reception of safety-critical messages at the roadside units (RSU) are also examined. However, for latency-sensitive applications like vehicular networks, it is essential to address multiple and correlated objectives. To solve such objectives effectively and simultaneously, the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) framework necessitates a better and more sophisticated model to enhance its ability. In this paper, a novel Cascaded MADDPG framework, CMADDPG, is proposed to train cascaded target critics, which aims at achieving expected rewards through the collaborative conduct of agents. The estimation bias phenomenon, which hinders a system's overall performance, is vividly circumvented in this cascaded algorithm. Eventually, experimental analysis also demonstrates the potential of the proposed algorithm by evaluating the convergence factor, which stabilizes quickly with minimum distortions, and reliable CAM message dissemination with 99% probability. The average AoI quantity is maintained within the 5-10 ms range, guaranteeing better QoS. This technique has proven its robustness in decentralized resource allocation against channel uncertainties caused by higher mobility in the environment. Most importantly, the performance of the proposed algorithm remains unaffected by increasing platoon size and leading channel uncertainties.

8.
ISA Trans ; : 1-9, 2024 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-39266336

RESUMO

This paper presents a novel hierarchical control scheme for solving the data-driven optimal cooperative tracking control problem of heterogeneous multi-agent systems. Considering that followers cannot communicate with the leader, a prescribed-time fully distributed observer is devised to estimate the leader's state for each follower. Then, the data-driven decentralized controller is designed to ensure that the follower's output can track the leader's one. Compared with the existing results, the advantages of the designed distributed observer are that the prescribed convergence time is completely predetermined by the designer, and the design of the observer gain is independent of the global topology information. Besides, the advantages of the designed decentralized controller are that neither the follower's system model nor a known initial stabilizing control policy is required. Finally, simulation results exemplify the advantage of the proposed method.

9.
Neural Netw ; 180: 106691, 2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39255635

RESUMO

This research delves into the challenges of achieving secure consensus tracking within multi-agent systems characterized by directed hypergraph topologies, in the face of hybrid deception attacks. The hybrid discrete and continuous deception attacks are targeted at the controller communication channels and the hyperedges, respectively. To overcome these threats, an impulsive control mechanism based on hypergraph theory are introduced, and sufficient conditions are established, under which consensus can be maintained in a mean-square bounded sense, supported by rigorous mathematical proofs. Furthermore, the investigation quantifies the relationship between the mean-square bounded consensus of the multi-agent system and the intensity of the deception attacks, delineating a specific range for this error metric. The robustness and effectiveness of the proposed control method are verified through comprehensive simulation experiments, demonstrating its applicability in varied scenarios influenced by these sophisticated attacks. This study underscores the potential of hypergraph-based strategies in enhancing system resilience against complex hybrid attacks.

10.
J Theor Biol ; 595: 111952, 2024 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-39322113

RESUMO

Cooperation is a cornerstone of social harmony and group success. Environmental feedbacks that provide information about resource availability play a crucial role in encouraging cooperation. Previous work indicates that the impact of resource heterogeneity on cooperation depends on the incentive to act in self-interest presented by a situation, demonstrating its potential to both hinder and facilitate cooperation. However, little is known about the underlying evolutionary drivers behind this phenomenon. Leveraging agent-based modeling and game theory, we explore how differences in resource availability across environments influence the evolution of cooperation. Our results show that resource variation hinders cooperation when resources are slowly replenished but supports cooperation when resources are more readily available. Furthermore, simulations in different scenarios suggest that discerning the rate of natural selection acts on strategies under distinct evolutionary dynamics is instrumental in elucidating the intricate nexus between resource variability and cooperation. When evolutionary forces are strong, resource heterogeneity tends to work against cooperation, yet relaxed selection conditions enable it to facilitate cooperation. Inspired by these findings, we also propose a potential application in improving the performance of artificial intelligence systems through policy optimization in multi-agent reinforcement learning. These explorations promise a novel perspective in understanding the evolution of social organisms and the impact of different interactions on the function of natural systems.

11.
Sci Rep ; 14(1): 22622, 2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39349932

RESUMO

With the proliferation of services and the vast amount of data produced by the Internet, numerous services with comparable functionalities but varying Quality of Service (QoS) attributes are potential candidates for meeting user needs. Consequently, the selection of the most suitable services has become increasingly challenging. To address this issue, a synthesis of multiple services is conducted through a composition process to create more sophisticated services. In recent years, there has been a growing interest in QoS uncertainty, given its potential impact on determining an optimal composite service, where each service is characterized by multiple QoS properties (e.g., response time and cost) that are frequently subject to change primarily due to environmental factors. Here, we introduce a novel approach that depends on the Multi-Agent Whale Optimization Algorithm (MA-WOA) for web service composition problem. Our proposed algorithm utilizes a multi-agent system for the representation and control of potential services, utilizing MA-WOA to identify the optimal composition that meets the user's requirements. It accounts for multiple quality factors and employs a weighted aggregation function to combine them into a cohesive fitness function. The efficiency of the suggested method is evaluated using a real and artificial web service composition dataset (comprising a total of 52,000 web services), with results indicating its superiority over other state-of-the-art methods in terms of composition quality and computational effectiveness. Therefore, the proposed strategy presents a feasible and effective solution to the web service composition challenge, representing a significant advancement in the field of service-oriented computing.

12.
Comput Methods Programs Biomed ; 257: 108416, 2024 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-39342877

RESUMO

BACKGROUND: In predicting post-operative outcomes for patients with end-stage renal disease, our study faced challenges related to class imbalance and a high-dimensional feature space. Therefore, with a focus on overcoming class imbalance and improving interpretability, we propose a novel feature selection approach using multi-agent reinforcement learning. METHODS: We proposed a multi-agent feature selection model based on a comprehensive reward function that combines classification model performance, Shapley additive explanations values, and the mutual information. The definition of rewards in reinforcement learning is crucial for model convergence and performance improvement. Initially, we set a deterministic reward based on the mutual information between variables and the target class, selecting variables that are highly dependent on the class, thus accelerating convergence. We then prioritized variables that influence the minority class on a sample basis and introduced a dynamic reward distribution strategy using Shapley additive explanations values to improve interpretability and solve the class imbalance problem. RESULTS: Involving the integration of electronic medical records, anesthesia records, operating room vital signs, and pre-operative anesthesia evaluations, our approach effectively mitigated class imbalance and demonstrated superior performance in ablation analysis. Our model achieved a 16% increase in the minority class F1 score and an 8.2% increase in the overall F1 score compared to the baseline model without feature selection. CONCLUSION: This study contributes important research findings that show that the multi-agent-based feature selection method can be a promising approach for solving the class imbalance problem.

13.
Sensors (Basel) ; 24(16)2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-39204838

RESUMO

Device-to-device (D2D) is a pivotal technology in the next generation of communication, allowing for direct task offloading between mobile devices (MDs) to improve the efficient utilization of idle resources. This paper proposes a novel algorithm for dynamic task offloading between the active MDs and the idle MDs in a D2D-MEC (mobile edge computing) system by deploying multi-agent deep reinforcement learning (DRL) to minimize the long-term average delay of delay-sensitive tasks under deadline constraints. Our core innovation is a dynamic partitioning scheme for idle and active devices in the D2D-MEC system, accounting for stochastic task arrivals and multi-time-slot task execution, which has been insufficiently explored in the existing literature. We adopt a queue-based system to formulate a dynamic task offloading optimization problem. To address the challenges of large action space and the coupling of actions across time slots, we model the problem as a Markov decision process (MDP) and perform multi-agent DRL through multi-agent proximal policy optimization (MAPPO). We employ a centralized training with decentralized execution (CTDE) framework to enable each MD to make offloading decisions solely based on its local system state. Extensive simulations demonstrate the efficiency and fast convergence of our algorithm. In comparison to the existing sub-optimal results deploying single-agent DRL, our algorithm reduces the average task completion delay by 11.0% and the ratio of dropped tasks by 17.0%. Our proposed algorithm is particularly pertinent to sensor networks, where mobile devices equipped with sensors generate a substantial volume of data that requires timely processing to ensure quality of experience (QoE) and meet the service-level agreements (SLAs) of delay-sensitive applications.

14.
SN Comput Sci ; 5(6): 749, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39100973

RESUMO

In a world where many activities are carried out digitally, it is increasingly urgent to be able to formally represent the norms, policies, and contracts that regulate these activities in order to make them understandable and processable by machine. In multi-agent systems, the process to be followed by a person to choose a formal model of norms and transform a norm written in a natural language into a formal one by using the selected model is a demanding task. In this paper, we introduce a methodology to be followed by people to understand the fundamental elements that they should consider for this transformation. We will focus mainly on a methodology for formalizing norms using the T-Norm model, this is because it allows us to express a rich set of different types of norms. Nevertheless, the proposed methodology is general enough to also be used, in some of its steps, to formalize norms using other formal languages. In the definition of the methodology, we will explicitly state which types of norms can be expressed with a given model and which cannot. Since there is not yet a set of different types of norms that is sufficiently expressive and is recognized as valid by the Normative Mutiagent Systems (NorMAS) community, another goal of this paper is to propose and discuss a rich set of norms types that could be used to study the expressive power of different formal models of norms, to compare them, and to translate norms formalized with one language into norms written in another language.

15.
Heliyon ; 10(14): e33975, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39108846

RESUMO

The goal of this paper is to mitigate disturbances and input delays while optimizing controller actuation updates for discrete-time multi-agent systems through the use of an event-triggered confinement control system, especially in resource-constrained scenarios. This approach when combined with event-triggered control techniques, then every follower in the system adjusts its condition at specified times based on an event-triggered condition that is suggested. The containment control system issue in the presence of disturbances and input delays was tackled by using both decentralized and centralized event-triggered control systems. Using matrix theory and the Lyapunov technique, convergence analysis is conducted to show that the proposed strategy stays free of zeno phenomena. Numerical boosts are used to further illustrate the impact of theoretical results.

16.
Front Robot AI ; 11: 1353870, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39109321

RESUMO

Understanding the emergence of symbol systems, especially language, requires the construction of a computational model that reproduces both the developmental learning process in everyday life and the evolutionary dynamics of symbol emergence throughout history. This study introduces the collective predictive coding (CPC) hypothesis, which emphasizes and models the interdependence between forming internal representations through physical interactions with the environment and sharing and utilizing meanings through social semiotic interactions within a symbol emergence system. The total system dynamics is theorized from the perspective of predictive coding. The hypothesis draws inspiration from computational studies grounded in probabilistic generative models and language games, including the Metropolis-Hastings naming game. Thus, playing such games among agents in a distributed manner can be interpreted as a decentralized Bayesian inference of representations shared by a multi-agent system. Moreover, this study explores the potential link between the CPC hypothesis and the free-energy principle, positing that symbol emergence adheres to the society-wide free-energy principle. Furthermore, this paper provides a new explanation for why large language models appear to possess knowledge about the world based on experience, even though they have neither sensory organs nor bodies. This paper reviews past approaches to symbol emergence systems, offers a comprehensive survey of related prior studies, and presents a discussion on CPC-based generalizations. Future challenges and potential cross-disciplinary research avenues are highlighted.

17.
Front Robot AI ; 11: 1375393, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39193080

RESUMO

Cooperative multi-agent systems make it possible to employ miniature robots in order to perform different experiments for data collection in wide open areas to physical interactions with test subjects in confined environments such as a hive. This paper proposes a new multi-agent path-planning approach to determine a set of trajectories where the agents do not collide with each other or any obstacle. The proposed algorithm leverages a risk-aware probabilistic roadmap algorithm to generate a map, employs node classification to delineate exploration regions, and incorporates a customized genetic framework to address the combinatorial optimization, with the ultimate goal of computing safe trajectories for the team. Furthermore, the proposed planning algorithm makes the agents explore all subdomains in the workspace together as a formation to allow the team to perform different tasks or collect multiple datasets for reliable localization or hazard detection. The objective function for minimization includes two major parts, the traveling distance of all the agents in the entire mission and the probability of collisions between the agents or agents with obstacles. A sampling method is used to determine the objective function considering the agents' dynamic behavior influenced by environmental disturbances and uncertainties. The algorithm's performance is evaluated for different group sizes by using a simulation environment, and two different benchmark scenarios are introduced to compare the exploration behavior. The proposed optimization method establishes stable and convergent properties regardless of the group size.

18.
Neural Netw ; 180: 106667, 2024 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-39216294

RESUMO

This paper addresses the tracking control problem of nonlinear discrete-time multi-agent systems (MASs). First, a local neighborhood error system (LNES) is constructed. Then, a novel tracking algorithm based on asynchronous iterative Q-learning (AIQL) is developed, which can transform the tracking problem into the optimal regulation of LNES. The AIQL-based algorithm has two Q values QiA and QiB for each agent i, where QiA is used for improving the control policy and QiB is used for evaluating the value of the control policy. Moreover, the convergence of LNES is given. It is shown that the LNES converges to 0 and the tracking problem is solved. A neural network-based actor-critic framework is used to implement AIQL. The critic network of AIQL is composed of two neural networks, which are used for approximating QiA and QiB respectively. Finally, simulation results are given to verify the performance of the developed algorithm. It is shown that the AIQL-based tracking algorithm has a lower cost value and faster convergence speed than the IQL-based tracking algorithm.

19.
Proc Natl Acad Sci U S A ; 121(36): e2313191121, 2024 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-39196625

RESUMO

Achieving more sustainable adaptation to social-environmental change demands the transformation of the narratives that provide the rationale for risk governance. These narratives often reflect long-standing beliefs about social and political relationships, ascribe actions and responsibilities, and specify solutions to risk. When such solutions are implemented through material investments in landscapes, these narratives become embedded in physical infrastructure with long legacies. Dominant narratives can mask a range of divergent problem framings. By masking alternatives, narratives can contribute to the persistence of unsustainable governance trajectories. Decision-support tools have begun to represent narratives as drivers of system dynamics; making narratives visible can reveal opportunities for more sustainable governance. We present the results of the project "The Dynamics of Multi-Scalar Adaptation in the Megalopolis", a dynamic, exploratory model of socio-hydrological risks in Mexico City that was designed to both endogenize and simultaneously challenge the dominant narratives that characterize water-risk governance in the city. Qualitative data characterize dominant narratives at city and borough scales. An agent-based model, informed by multicriteria decision analysis and coupled with hydrological, urbanization, and climatic model inputs, permitted the development of exploratory governance scenarios designed to challenge dominant narratives. Scenarios revealed how dominant narratives may contribute to the persistence of vulnerability "hotspots" in the city, despite stated goals of equity and vulnerability alleviation. Participatory workshops with representatives of the city government illustrate how making such narratives visible through exploratory modeling can lead to a questioning of prior assumptions and causal relations, recognition of a need for intersectoral collaboration, and insights into potential management strategies.

20.
Neural Netw ; 179: 106552, 2024 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-39089154

RESUMO

Multi-agent reinforcement learning (MARL) effectively improves the learning speed of agents in sparse reward tasks with the guide of subgoals. However, existing works sever the consistency of the learning objectives of the subgoal generation and subgoal reached stages, thereby significantly inhibiting the effectiveness of subgoal learning. To address this problem, we propose a novel Potential field Subgoal-based Multi-Agent reinforcement learning (PSMA) method, which introduces the potential field (PF) to unify the two-stage learning objectives. Specifically, we design a state-to-PF representation model that describes agents' states as potential fields, allowing easy measurement of the interaction effect for both allied and enemy agents. With the PF representation, a subgoal selector is designed to automatically generate multiple subgoals for each agent, drawn from the experience replay buffer that contains both individual and total PF values. Based on the determined subgoals, we define an intrinsic reward function to guide the agent to reach their respective subgoals while maximizing the joint action-value. Experimental results show that our method outperforms the state-of-the-art MARL method on both StarCraft II micro-management (SMAC) and Google Research Football (GRF) tasks with sparse reward settings.


Assuntos
Reforço Psicológico , Recompensa , Redes Neurais de Computação , Humanos , Algoritmos , Aprendizado de Máquina
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA