Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Neural Netw ; 167: 450-459, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37683459

RESUMEN

Effective exploration is the key to achieving high returns for reinforcement learning. Agents must explore jointly in multi-agent systems to find the optimal joint policy. Due to the exploration problem and the shared reward, the policy-based multi-agent reinforcement learning algorithms face policy overfitting, which may lead to the joint policy falling into a local optimum. This paper introduces a novel general framework called Learning Joint-Action Intrinsic Reward (LJIR) for improving multi-agent reinforcement learners' joint exploration ability and performance. LJIR observes agents' state and joint actions to learn to construct an intrinsic reward online that can guide effective joint exploration. With the novel combination of Transformer and random network distillation, LJIR selects the novel states to give more intrinsic rewards, which help agents find the best joint actions. LJIR can dynamically adjust the weight of exploration and exploitation during training and keep the policy invariance finally. To ensure LJIR seamlessly adopts existing MARL algorithms, we also provide a flexible combination method for intrinsic and external rewards. Empirical results on the SMAC benchmark show that the proposed method achieves state-of-the-art performance in challenging tasks.


Asunto(s)
Algoritmos , Refuerzo en Psicología , Recompensa
2.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12098-12112, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37285257

RESUMEN

Deep reinforcement learning (RL) has been applied extensively to solve complex decision-making problems. In many real-world scenarios, tasks often have several conflicting objectives and may require multiple agents to cooperate, which are the multi-objective multi-agent decision-making problems. However, only few works have been conducted on this intersection. Existing approaches are limited to separate fields and can only handle multi-agent decision-making with a single objective, or multi-objective decision-making with a single agent. In this paper, we propose MO-MIX to solve the multi-objective multi-agent reinforcement learning (MOMARL) problem. Our approach is based on the centralized training with decentralized execution (CTDE) framework. A weight vector representing preference over the objectives is fed into the decentralized agent network as a condition for local action-value function estimation, while a mixing network with parallel architecture is used to estimate the joint action-value function. In addition, an exploration guide approach is applied to improve the uniformity of the final non-dominated solutions. Experiments demonstrate that the proposed method can effectively solve the multi-objective multi-agent cooperative decision-making problem and generate an approximation of the Pareto set. Our approach not only significantly outperforms the baseline method in all four kinds of evaluation metrics, but also requires less computational cost.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA