Búsqueda | Portal Regional de la BVS

Strangeness-driven exploration in multi-agent reinforcement learning.

Kim, Ju-Bong; Choi, Ho-Bin; Han, Youn-Hee.

Neural Netw ; 172: 106149, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38306786

RESUMEN

In this study, a novel exploration method for centralized training and decentralized execution (CTDE)-based multi-agent reinforcement learning (MARL) is introduced. The method uses the concept of strangeness, which is determined by evaluating (1) the level of the unfamiliarity of the observations an agent encounters and (2) the level of the unfamiliarity of the entire state the agents visit. An exploration bonus, which is derived from the concept of strangeness, is combined with the extrinsic reward obtained from the environment to form a mixed reward, which is then used for training CTDE-based MARL algorithms. Additionally, a separate action-value function is also proposed to prevent the high exploration bonus from overwhelming the sensitivity to extrinsic rewards during MARL training. This separate function is used to design the behavioral policy for generating transitions. The proposed method is not much affected by stochastic transitions commonly observed in MARL tasks and improves the stability of CTDE-based MARL algorithms when used with an exploration method. By providing didactic examples and demonstrating the substantial performance improvement of our proposed exploration method in CTDE-based MARL algorithms, we illustrate the advantages of our approach. These evaluations highlight how our method outperforms state-of-the-art MARL baselines on challenging tasks within the StarCraft II micromanagement benchmark, underscoring its effectiveness in improving MARL.

Asunto(s)

Aprendizaje , Refuerzo en Psicología , Recompensa , Algoritmos , Benchmarking

Compound Context-Aware Bayesian Inference Scheme for Smart IoT Environment.

Ullah, Ihsan; Kim, Ju-Bong; Han, Youn-Hee.

Sensors (Basel) ; 22(8)2022 Apr 14.

Artículo en Inglés | MEDLINE | ID: mdl-35459007

RESUMEN

The objective of smart cities is to improve the quality of life for citizens by using Information and Communication Technology (ICT). The smart IoT environment consists of multiple sensor devices that continuously produce a large amount of data. In the IoT system, accurate inference from multi-sensor data is imperative to make a correct decision. Sensor data are often imprecise, resulting in low-quality inference results and wrong decisions. Correspondingly, single-context data are insufficient for making an accurate decision. In this paper, a novel compound context-aware scheme is proposed based on Bayesian inference to achieve accurate fusion and inference from the sensory data. In the proposed scheme, multi-sensor data are fused based on the relation and contexts of sensor data whether they are dependent or not on each other. Extensive computer simulations show that the proposed technique significantly improves the inference accuracy when it is compared to the other two representative Bayesian inference techniques.

Asunto(s)

Comunicación , Calidad de Vida , Teorema de Bayes , Ciudades , Simulación por Computador

Sortation Control Using Multi-Agent Deep Reinforcement Learning in N-Grid Sortation System.

Kim, Ju-Bong; Choi, Ho-Bin; Hwang, Gyu-Young; Kim, Kwihoon; Hong, Yong-Geun; Han, Youn-Hee.

Sensors (Basel) ; 20(12)2020 Jun 16.

Artículo en Inglés | MEDLINE | ID: mdl-32560217

RESUMEN

Intralogistics is a technology that optimizes, integrates, automates, and manages the logistics flow of goods within a logistics transportation and sortation center. As the demand for parcel transportation increases, many sortation systems have been developed. In general, the goal of sortation systems is to route (or sort) parcels correctly and quickly. We design an n-grid sortation system that can be flexibly deployed and used at intralogistics warehouse and develop a collaborative multi-agent reinforcement learning (RL) algorithm to control the behavior of emitters or sorters in the system. We present two types of RL agents, emission agents and routing agents, and they are trained to achieve the given sortation goals together. For the verification of the proposed system and algorithm, we implement them in a full-fledged cyber-physical system simulator and describe the RL agents' learning performance. From the learning results, we present that the well-trained collaborative RL agents can optimize their performance effectively. In particular, the routing agents finally learn to route the parcels through their optimal paths, while the emission agents finally learn to balance the inflow and outflow of parcels.

Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices.

Lim, Hyun-Kyo; Kim, Ju-Bong; Heo, Joo-Seong; Han, Youn-Hee.

Sensors (Basel) ; 20(5)2020 Mar 02.

Artículo en Inglés | MEDLINE | ID: mdl-32121671

RESUMEN

Reinforcement learning has recently been studied in various fields and also used to optimally control IoT devices supporting the expansion of Internet connection beyond the usual standard devices. In this paper, we try to allow multiple reinforcement learning agents to learn optimal control policy on their own IoT devices of the same type but with slightly different dynamics. For such multiple IoT devices, there is no guarantee that an agent who interacts only with one IoT device and learns the optimal control policy will also control another IoT device well. Therefore, we may need to apply independent reinforcement learning to each IoT device individually, which requires a costly or time-consuming effort. To solve this problem, we propose a new federated reinforcement learning architecture where each agent working on its independent IoT device shares their learning experience (i.e., the gradient of loss function) with each other, and transfers a mature policy model parameters into other agents. They accelerate its learning process by using mature parameters. We incorporate the actor-critic proximal policy optimization (Actor-Critic PPO) algorithm into each agent in the proposed collaborative architecture and propose an efficient procedure for the gradient sharing and the model transfer. Using multiple rotary inverted pendulum devices interconnected via a network switch, we demonstrate that the proposed federated reinforcement learning scheme can effectively facilitate the learning process for multiple IoT devices and that the learning speed can be faster if more agents are involved.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA