Pesquisa | Biblioteca Virtual em Saúde

Master-Slave Deep Architecture for Top- K Multiarmed Bandits With Nonlinear Bandit Feedback and Diversity Constraints.

Huang, Hanchi; Shen, Li; Ye, Deheng; Liu, Wei.

IEEE Trans Neural Netw Learn Syst ; PP2023 Nov 24.

Artigo em Inglês | MEDLINE | ID: mdl-37999964

RESUMO

We propose a novel master-slave architecture to solve the top- K combinatorial multiarmed bandits (CMABs) problem with nonlinear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback. Specifically, to efficiently explore the combinatorial and constrained action space, we introduce six slave models with distinguished merits to generate diversified samples well balancing rewards and constraints as well as efficiency. Moreover, we propose teacher learning-based optimization and the policy cotraining technique to boost the performance of the multiple slave models. The master model then collects the elite samples provided by the slave models and selects the best sample estimated by a neural contextual UCB-based network (NeuralUCB) to decide on a tradeoff between exploration and exploitation. Thanks to the elaborate design of slave models, the cotraining mechanism among slave models, and the novel interactions between the master and slave models, our approach significantly surpasses existing state-of-the-art algorithms in both synthetic and real datasets for recommendation tasks. The code is available at https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits.

Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization.

Zhang, Tiantian; Lin, Zichuan; Wang, Yuxing; Ye, Deheng; Fu, Qiang; Yang, Wei; Wang, Xueqian; Liang, Bin; Yuan, Bo; Li, Xiu.

IEEE Trans Neural Netw Learn Syst ; PP2023 Jun 07.

Artigo em Inglês | MEDLINE | ID: mdl-37285252

RESUMO

A key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the reinforcement learning (RL) agent's behavior as the environment changes over its lifetime while minimizing the catastrophic forgetting of the learned information. To address this challenge, in this article, we propose DaCoRL, that is, dynamics-adaptive continual RL. DaCoRL learns a context-conditioned policy using progressive contextualization, which incrementally clusters a stream of stationary tasks in the dynamic environment into a series of contexts and opts for an expandable multihead neural network to approximate the policy. Specifically, we define a set of tasks with similar dynamics as an environmental context and formalize context inference as a procedure of online Bayesian infinite Gaussian mixture clustering on environment features, resorting to online Bayesian inference to infer the posterior distribution over contexts. Under the assumption of a Chinese restaurant process (CRP) prior, this technique can accurately classify the current task as a previously seen context or instantiate a new context as needed without relying on any external indicator to signal environmental changes in advance. Furthermore, we employ an expandable multihead neural network whose output layer is synchronously expanded with the newly instantiated context and a knowledge distillation regularization term for retaining the performance on learned tasks. As a general framework that can be coupled with various deep RL algorithms, DaCoRL features consistent superiority over existing methods in terms of stability, overall performance, and generalization ability, as verified by extensive experiments on several robot navigation and MuJoCo locomotion tasks.

Curriculum-Based Asymmetric Multi-Task Reinforcement Learning.

Huang, Hanchi; Ye, Deheng; Shen, Li; Liu, Wei.

IEEE Trans Pattern Anal Mach Intell ; 45(6): 7258-7269, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-36417748

RESUMO

We introduce CAMRL, the first curriculum-based asymmetric multi-task learning (AMTL) algorithm for dealing with multiple reinforcement learning (RL) tasks altogether. To mitigate the negative influence of customizing the one-off training order in curriculum-based AMTL, CAMRL switches its training mode between parallel single-task RL and asymmetric multi-task RL (MTRL), according to an indicator regarding the training time, the overall performance, and the performance gap among tasks. To leverage the multi-sourced prior knowledge flexibly and to reduce negative transfer in AMTL, we customize a composite loss with multiple differentiable ranking functions and optimize the loss through alternating optimization and the Frank-Wolfe algorithm. The uncertainty-based automatic adjustment of hyper-parameters is also applied to eliminate the need of laborious hyper-parameter analysis during optimization. By optimizing the composite loss, CAMRL predicts the next training task and continuously revisits the transfer matrix and network weights. We have conducted experiments on a wide range of benchmarks in multi-task RL, covering Gym-minigrid, Meta-world, Atari video games, vision-based PyBullet tasks, and RLBench, to show the improvements of CAMRL over the corresponding single-task RL algorithm and state-of-the-art MTRL algorithms. The code is available at: https://github.com/huanghanchi/CAMRL.

Supervised Learning Achieves Human-Level Performance in MOBA Games: A Case Study of Honor of Kings.

Ye, Deheng; Chen, Guibin; Zhao, Peilin; Qiu, Fuhao; Yuan, Bo; Zhang, Wen; Chen, Sheng; Sun, Mingfei; Li, Xiaoqian; Li, Siqin; Liang, Jing; Lian, Zhenjie; Shi, Bei; Wang, Liang; Shi, Tengfei; Fu, Qiang; Yang, Wei; Huang, Lanxiao.

IEEE Trans Neural Netw Learn Syst ; 33(3): 908-918, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-33147150

RESUMO

We present JueWu-SL, the first supervised-learning-based artificial intelligence (AI) program that achieves human-level performance in playing multiplayer online battle arena (MOBA) games. Unlike prior attempts, we integrate the macro-strategy and the micromanagement of MOBA-game-playing into neural networks in a supervised and end-to-end manner. Tested on Honor of Kings, the most popular MOBA at present, our AI performs competitively at the level of High King players in standard 5v5 games.

Assuntos

Jogos de Vídeo , Inteligência Artificial , Humanos , Redes Neurais de Computação , Aprendizado de Máquina Supervisionado

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA