Your browser doesn't support javascript.
loading
Representation learning for continuous action spaces is beneficial for efficient policy learning.
Zhao, Tingting; Wang, Ying; Sun, Wei; Chen, Yarui; Niu, Gang; Sugiyama, Masashi.
Affiliation
  • Zhao T; College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin 300457, PR China. Electronic address: tingting@tust.edu.cn.
  • Wang Y; College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin 300457, PR China. Electronic address: ilse_wang@mail.tust.edu.cn.
  • Sun W; College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin 300457, PR China. Electronic address: sunweitust@mail.tust.edu.cn.
  • Chen Y; College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin 300457, PR China. Electronic address: yrchen@tust.edu.cn.
  • Niu G; RIKEN Center for Advanced Intelligence Project (AIP), Tokyo, Japan. Electronic address: gang.niu.ml@gmail.com.
  • Sugiyama M; RIKEN Center for Advanced Intelligence Project (AIP), Tokyo, Japan; Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan. Electronic address: sugi@k.u-tokyo.ac.jp.
Neural Netw ; 159: 137-152, 2023 Feb.
Article in En | MEDLINE | ID: mdl-36566604
Deep reinforcement learning (DRL) breaks through the bottlenecks of traditional reinforcement learning (RL) with the help of the perception capability of deep learning and has been widely applied in real-world problems. While model-free RL, as a class of efficient DRL methods, performs the learning of state representations simultaneously with policy learning in an end-to-end manner when facing large-scale continuous state and action spaces. However, training such a large policy model requires a large number of trajectory samples and training time. On the other hand, the learned policy often fails to generalize to large-scale action spaces, especially for the continuous action spaces. To address this issue, in this paper we propose an efficient policy learning method in latent state and action spaces. More specifically, we extend the idea of state representations to action representations for better policy generalization capability. Meanwhile, we divide the whole learning task into learning with the large-scale representation models in an unsupervised manner and learning with the small-scale policy model in the RL manner. The small policy model facilitates policy learning, while not sacrificing generalization and expressiveness via the large representation model. Finally, the effectiveness of the proposed method is demonstrated by MountainCar, CarRacing and Cheetah experiments.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Machine Learning Language: En Journal: Neural Netw Journal subject: NEUROLOGIA Year: 2023 Document type: Article Country of publication: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Machine Learning Language: En Journal: Neural Netw Journal subject: NEUROLOGIA Year: 2023 Document type: Article Country of publication: United States