Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
1.
J Theor Biol ; 595: 111952, 2024 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-39322113

RESUMO

Cooperation is a cornerstone of social harmony and group success. Environmental feedbacks that provide information about resource availability play a crucial role in encouraging cooperation. Previous work indicates that the impact of resource heterogeneity on cooperation depends on the incentive to act in self-interest presented by a situation, demonstrating its potential to both hinder and facilitate cooperation. However, little is known about the underlying evolutionary drivers behind this phenomenon. Leveraging agent-based modeling and game theory, we explore how differences in resource availability across environments influence the evolution of cooperation. Our results show that resource variation hinders cooperation when resources are slowly replenished but supports cooperation when resources are more readily available. Furthermore, simulations in different scenarios suggest that discerning the rate of natural selection acts on strategies under distinct evolutionary dynamics is instrumental in elucidating the intricate nexus between resource variability and cooperation. When evolutionary forces are strong, resource heterogeneity tends to work against cooperation, yet relaxed selection conditions enable it to facilitate cooperation. Inspired by these findings, we also propose a potential application in improving the performance of artificial intelligence systems through policy optimization in multi-agent reinforcement learning. These explorations promise a novel perspective in understanding the evolution of social organisms and the impact of different interactions on the function of natural systems.

2.
Sensors (Basel) ; 24(11)2024 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-38894236

RESUMO

Frequency agility refers to the rapid variation of the carrier frequency of adjacent pulses, which is an effective radar active antijamming method against frequency spot jamming. Variation patterns of traditional pseudo-random frequency hopping methods are susceptible to analysis and decryption, rendering them ineffective against increasingly sophisticated jamming strategies. Although existing reinforcement learning-based methods can adaptively optimize frequency hopping strategies, they are limited in adapting to the diversity and dynamics of jamming strategies, resulting in poor performance in the face of complex unknown jamming strategies. This paper proposes an AK-MADDPG (Adaptive K-th order history-based Multi-Agent Deep Deterministic Policy Gradient) method for designing frequency hopping strategies in frequency agile radar. Signal pulses within a coherent processing interval are treated as agents, learning to optimize their hopping strategies in the case of unknown jamming strategies. Agents dynamically adjust their carrier frequencies to evade jamming and collaborate with others to enhance antijamming efficacy. This approach exploits cooperative relationships among the pulses, providing additional information for optimized frequency hopping strategies. In addition, an adaptive K-th order history method has been introduced into the algorithm to capture long-term dependencies in sequential data. Simulation results demonstrate the superior performance of the proposed method.

3.
Sensors (Basel) ; 24(9)2024 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-38732885

RESUMO

Delay-sensitive task offloading in a device-to-device assisted mobile edge computing (D2D-MEC) system with energy harvesting devices is a critical challenge due to the dynamic load level at edge nodes and the variability in harvested energy. In this paper, we propose a joint dynamic task offloading and CPU frequency control scheme for delay-sensitive tasks in a D2D-MEC system, taking into account the intricacies of multi-slot tasks, characterized by diverse processing speeds and data transmission rates. Our methodology involves meticulous modeling of task arrival and service processes using queuing systems, coupled with the strategic utilization of D2D communication to alleviate edge server load and prevent network congestion effectively. Central to our solution is the formulation of average task delay optimization as a challenging nonlinear integer programming problem, requiring intelligent decision making regarding task offloading for each generated task at active mobile devices and CPU frequency adjustments at discrete time slots. To navigate the intricate landscape of the extensive discrete action space, we design an efficient multi-agent DRL learning algorithm named MAOC, which is based on MAPPO, to minimize the average task delay by dynamically determining task-offloading decisions and CPU frequencies. MAOC operates within a centralized training with decentralized execution (CTDE) framework, empowering individual mobile devices to make decisions autonomously based on their unique system states. Experimental results demonstrate its swift convergence and operational efficiency, and it outperforms other baseline algorithms.

4.
Sensors (Basel) ; 24(16)2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-39204838

RESUMO

Device-to-device (D2D) is a pivotal technology in the next generation of communication, allowing for direct task offloading between mobile devices (MDs) to improve the efficient utilization of idle resources. This paper proposes a novel algorithm for dynamic task offloading between the active MDs and the idle MDs in a D2D-MEC (mobile edge computing) system by deploying multi-agent deep reinforcement learning (DRL) to minimize the long-term average delay of delay-sensitive tasks under deadline constraints. Our core innovation is a dynamic partitioning scheme for idle and active devices in the D2D-MEC system, accounting for stochastic task arrivals and multi-time-slot task execution, which has been insufficiently explored in the existing literature. We adopt a queue-based system to formulate a dynamic task offloading optimization problem. To address the challenges of large action space and the coupling of actions across time slots, we model the problem as a Markov decision process (MDP) and perform multi-agent DRL through multi-agent proximal policy optimization (MAPPO). We employ a centralized training with decentralized execution (CTDE) framework to enable each MD to make offloading decisions solely based on its local system state. Extensive simulations demonstrate the efficiency and fast convergence of our algorithm. In comparison to the existing sub-optimal results deploying single-agent DRL, our algorithm reduces the average task completion delay by 11.0% and the ratio of dropped tasks by 17.0%. Our proposed algorithm is particularly pertinent to sensor networks, where mobile devices equipped with sensors generate a substantial volume of data that requires timely processing to ensure quality of experience (QoE) and meet the service-level agreements (SLAs) of delay-sensitive applications.

5.
Sensors (Basel) ; 24(2)2024 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-38257602

RESUMO

As a promising paradigm, mobile crowdsensing (MCS) takes advantage of sensing abilities and cooperates with multi-agent reinforcement learning technologies to provide services for users in large sensing areas, such as smart transportation, environment monitoring, etc. In most cases, strategy training for multi-agent reinforcement learning requires substantial interaction with the sensing environment, which results in unaffordable costs. Thus, environment reconstruction via extraction of the causal effect model from past data is an effective way to smoothly accomplish environment monitoring. However, the sensing environment is often so complex that the observable and unobservable data collected are sparse and heterogeneous, affecting the accuracy of the reconstruction. In this paper, we focus on developing a robust multi-agent environment monitoring framework, called self-interested coalitional crowdsensing for multi-agent interactive environment monitoring (SCC-MIE), including environment reconstruction and worker selection. In SCC-MIE, we start from a multi-agent generative adversarial imitation learning framework to introduce a new self-interested coalitional learning strategy, which forges cooperation between a reconstructor and a discriminator to learn the sensing environment together with the hidden confounder while providing interpretability on the results of environment monitoring. Based on this, we utilize the secretary problem to select suitable workers to collect data for accurate environment monitoring in a real-time manner. It is shown that SCC-MIE realizes a significant performance improvement in environment monitoring compared to the existing models.

6.
Sensors (Basel) ; 24(17)2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-39275567

RESUMO

The platooning of cars and trucks is a pertinent approach for autonomous driving due to the effective utilization of roadways. The decreased gas consumption levels are an added merit owing to sustainability. Conventional platooning depended on Dedicated Short-Range Communication (DSRC)-based vehicle-to-vehicle communications. The computations were executed by the platoon members with their constrained capabilities. The advent of 5G has favored Intelligent Transportation Systems (ITS) to adopt Multi-access Edge Computing (MEC) in platooning paradigms by offloading the computational tasks to the edge server. In this research, vital parameters in vehicular platooning systems, viz. latency-sensitive radio resource management schemes, and Age of Information (AoI) are investigated. In addition, the delivery rates of Cooperative Awareness Messages (CAM) that ensure expeditious reception of safety-critical messages at the roadside units (RSU) are also examined. However, for latency-sensitive applications like vehicular networks, it is essential to address multiple and correlated objectives. To solve such objectives effectively and simultaneously, the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) framework necessitates a better and more sophisticated model to enhance its ability. In this paper, a novel Cascaded MADDPG framework, CMADDPG, is proposed to train cascaded target critics, which aims at achieving expected rewards through the collaborative conduct of agents. The estimation bias phenomenon, which hinders a system's overall performance, is vividly circumvented in this cascaded algorithm. Eventually, experimental analysis also demonstrates the potential of the proposed algorithm by evaluating the convergence factor, which stabilizes quickly with minimum distortions, and reliable CAM message dissemination with 99% probability. The average AoI quantity is maintained within the 5-10 ms range, guaranteeing better QoS. This technique has proven its robustness in decentralized resource allocation against channel uncertainties caused by higher mobility in the environment. Most importantly, the performance of the proposed algorithm remains unaffected by increasing platoon size and leading channel uncertainties.

7.
Sensors (Basel) ; 23(11)2023 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-37299907

RESUMO

Fast convergence routing is a critical issue for Low Earth Orbit (LEO) constellation networks because these networks have dynamic topology changes, and transmission requirements can vary over time. However, most of the previous research has focused on the Open Shortest Path First (OSPF) routing algorithm, which is not well-suited to handle the frequent changes in the link state of the LEO satellite network. In this regard, we propose a Fast-Convergence Reinforcement Learning Satellite Routing Algorithm (FRL-SR) for LEO satellite networks, where the satellite can quickly obtain the network link status and adjust its routing strategy accordingly. In FRL-SR, each satellite node is considered an agent, and the agent selects the appropriate port for packet forwarding based on its routing policy. Whenever the satellite network state changes, the agent sends "hello" packets to the neighboring nodes to update their routing policy. Compared to traditional reinforcement learning algorithms, FRL-SR can perceive network information faster and converge faster. Additionally, FRL-SR can mask the dynamics of the satellite network topology and adaptively adjust the forwarding strategy based on the link state. The experimental results demonstrate that the proposed FRL-SR algorithm outperforms the Dijkstra algorithm in the performance of average delay, packet arriving ratio, and network load balance.


Assuntos
Algoritmos , Redes de Comunicação de Computadores
8.
Sensors (Basel) ; 23(10)2023 May 12.
Artigo em Inglês | MEDLINE | ID: mdl-37430623

RESUMO

Connected and automated vehicles (CAVs) require multiple tasks in their seamless maneuverings. Some essential tasks that require simultaneous management and actions are motion planning, traffic prediction, traffic intersection management, etc. A few of them are complex in nature. Multi-agent reinforcement learning (MARL) can solve complex problems involving simultaneous controls. Recently, many researchers applied MARL in such applications. However, there is a lack of extensive surveys on the ongoing research to identify the current problems, proposed methods, and future research directions in MARL for CAVs. This paper provides a comprehensive survey on MARL for CAVs. A classification-based paper analysis is performed to identify the current developments and highlight the various existing research directions. Finally, the challenges in current works are discussed, and some potential areas are given for exploration to overcome those challenges. Future readers will benefit from this survey and can apply the ideas and findings in their research to solve complex problems.

9.
Sensors (Basel) ; 23(12)2023 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-37420781

RESUMO

This paper presents a multi-agent reinforcement learning (MARL) algorithm to address the scheduling and routing problems of multiple automated guided vehicles (AGVs), with the goal of minimizing overall energy consumption. The proposed algorithm is developed based on the multi-agent deep deterministic policy gradient (MADDPG) algorithm, with modifications made to the action and state space to fit the setting of AGV activities. While previous studies overlooked the energy efficiency of AGVs, this paper develops a well-designed reward function that helps to optimize the overall energy consumption required to fulfill all tasks. Moreover, we incorporate the e-greedy exploration strategy into the proposed algorithm to balance exploration and exploitation during training, which helps it converge faster and achieve better performance. The proposed MARL algorithm is equipped with carefully selected parameters that aid in avoiding obstacles, speeding up path planning, and achieving minimal energy consumption. To demonstrate the effectiveness of the proposed algorithm, three types of numerical experiments including the ϵ-greedy MADDPG, MADDPG, and Q-Learning methods were conducted. The results show that the proposed algorithm can effectively solve the multi-AGV task assignment and path planning problems, and the energy consumption results show that the planned routes can effectively improve energy efficiency.


Assuntos
Aprendizagem , Recompensa , Algoritmos , Veículos Autônomos , Fenômenos Físicos
10.
Sensors (Basel) ; 23(5)2023 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-36904577

RESUMO

Intelligent traffic management systems have become one of the main applications of Intelligent Transportation Systems (ITS). There is a growing interest in Reinforcement Learning (RL) based control methods in ITS applications such as autonomous driving and traffic management solutions. Deep learning helps in approximating substantially complex nonlinear functions from complicated data sets and tackling complex control issues. In this paper, we propose an approach based on Multi-Agent Reinforcement Learning (MARL) and smart routing to improve the flow of autonomous vehicles on road networks. We evaluate Multi-Agent Advantage Actor-Critic (MA2C) and Independent Advantage Actor-Critical (IA2C), recently suggested Multi-Agent Reinforcement Learning techniques with smart routing for traffic signal optimization to determine its potential. We investigate the framework offered by non-Markov decision processes, enabling a more in-depth understanding of the algorithms. We conduct a critical analysis to observe the robustness and effectiveness of the method. The method's efficacy and reliability are demonstrated by simulations using SUMO, a software modeling tool for traffic simulations. We used a road network that contains seven intersections. Our findings show that MA2C, when trained on pseudo-random vehicle flows, is a viable methodology that outperforms competing techniques.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA