Pesquisa | Biblioteca Virtual em Saúde

Sample Efficient Deep Reinforcement Learning With Online State Abstraction and Causal Transformer Model Prediction.

Lan, Yixing; Xu, Xin; Fang, Qiang; Hao, Jianye.

IEEE Trans Neural Netw Learn Syst ; PP2023 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-37581972

RESUMO

Deep reinforcement learning (RL) typically requires a tremendous number of training samples, which are not practical in many applications. State abstraction and world models are two promising approaches for improving sample efficiency in deep RL. However, both state abstraction and world models may degrade the learning performance. In this article, we propose an abstracted model-based policy learning (AMPL) algorithm, which improves the sample efficiency of deep RL. In AMPL, a novel state abstraction method via multistep bisimulation is first developed to learn task-related latent state spaces. Hence, the original Markov decision processes (MDPs) are compressed into abstracted MDPs. Then, a causal transformer model predictor (CTMP) is designed to approximate the abstracted MDPs and generate long-horizon simulated trajectories with a smaller multistep prediction error. Policies are efficiently learned through these trajectories within the abstracted MDPs via a modified multistep soft actor-critic algorithm with a λ -target. Moreover, theoretical analysis shows that the AMPL algorithm can improve sample efficiency during the training process. On Atari games and the DeepMind Control (DMControl) suite, AMPL surpasses current state-of-the-art deep RL algorithms in terms of sample efficiency. Furthermore, DMControl tasks with moving noises are conducted, and the results demonstrate that AMPL is robust to task-irrelevant observational distractors and significantly outperforms the existing approaches.

AMARL: An Attention-Based Multiagent Reinforcement Learning Approach to the Min-Max Multiple Traveling Salesmen Problem.

Gao, Hao; Zhou, Xing; Xu, Xin; Lan, Yixing; Xiao, Yongqian.

IEEE Trans Neural Netw Learn Syst ; PP2023 Feb 08.

Artigo em Inglês | MEDLINE | ID: mdl-37022807

RESUMO

In recent years, the multiple traveling salesmen problem (MTSP or multiple TSP) has received increasing research interest and one of its main applications is coordinated multirobot mission planning, such as cooperative search and rescue tasks. However, it is still challenging to solve MTSP with improved inference efficiency as well as solution quality in varying situations, e.g., different city positions, different numbers of cities, or agents. In this article, we propose an attention-based multiagent reinforcement learning (AMARL) approach, which is based on the gated transformer feature representations for min-max multiple TSPs. The state feature extraction network in our proposed approach adopts the gated transformer architecture with reordering layer normalization (LN) and a new gate mechanism. It aggregates fixed-dimensional attention-based state features irrespective of the number of agents and cities. The action space of our proposed approach is designed to decouple the interaction of agents' simultaneous decision-making. At each time step, only one agent is assigned to a non-zero action so that the action selection strategy can be transferred across tasks with different numbers of agents and cities. Extensive experiments on min-max multiple TSPs were conducted to illustrate the effectiveness and advantages of the proposed approach. Compared with six representative algorithms, our proposed approach achieves state-of-the-art performance in solution quality and inference efficiency. In particular, the proposed approach is suitable for tasks with different numbers of agents or cities without extra learning, and experimental results demonstrate that the proposed approach realizes powerful transfer capability across tasks.

Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction.

Zhang, Yichuan; Lan, Yixing; Fang, Qiang; Xu, Xin; Li, Junxiang; Zeng, Yujun.

Comput Intell Neurosci ; 2021: 7588221, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34603434

RESUMO

Reinforcement learning from demonstration (RLfD) is considered to be a promising approach to improve reinforcement learning (RL) by leveraging expert demonstrations as the additional decision-making guidance. However, most existing RLfD methods only regard demonstrations as low-level knowledge instances under a certain task. Demonstrations are generally used to either provide additional rewards or pretrain the neural network-based RL policy in a supervised manner, usually resulting in poor generalization capability and weak robustness performance. Considering that human knowledge is not only interpretable but also suitable for generalization, we propose to exploit the potential of demonstrations by extracting knowledge from them via Bayesian networks and develop a novel RLfD method called Reinforcement Learning from demonstration via Bayesian Network-based Knowledge (RLBNK). The proposed RLBNK method takes advantage of node influence with the Wasserstein distance metric (NIW) algorithm to obtain abstract concepts from demonstrations and then a Bayesian network conducts knowledge learning and inference based on the abstract data set, which will yield the coarse policy with corresponding confidence. Once the coarse policy's confidence is low, another RL-based refine module will further optimize and fine-tune the policy to form a (near) optimal hybrid policy. Experimental results show that the proposed RLBNK method improves the learning efficiency of corresponding baseline RL algorithms under both normal and sparse reward settings. Furthermore, we demonstrate that our RLBNK method delivers better generalization capability and robustness than baseline methods.

Assuntos

Reforço Psicológico , Recompensa , Algoritmos , Teorema de Bayes , Humanos , Bases de Conhecimento

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA