Your browser doesn't support javascript.
loading
Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty.
Kim, MyeongSeop; Kim, Jung-Su; Choi, Myoung-Su; Park, Jae-Han.
Afiliación
  • Kim M; Research Center for Electrical and Information Technology, Department of Electrical and Information Engineering, Seoul National University of Science and Technology, Seoul 01811, Korea.
  • Kim JS; Applied Robot R&D Department, Korea Institute of Industrial Technology (KITECH), Ansan 15588, Korea.
  • Choi MS; Research Center for Electrical and Information Technology, Department of Electrical and Information Engineering, Seoul National University of Science and Technology, Seoul 01811, Korea.
  • Park JH; Applied Robot R&D Department, Korea Institute of Industrial Technology (KITECH), Ansan 15588, Korea.
Sensors (Basel) ; 22(19)2022 Sep 25.
Article en En | MEDLINE | ID: mdl-36236366
ABSTRACT
Reinforcement learning (RL) trains an agent by maximizing the sum of a discounted reward. Since the discount factor has a critical effect on the learning performance of the RL agent, it is important to choose the discount factor properly. When uncertainties are involved in the training, the learning performance with a constant discount factor can be limited. For the purpose of obtaining acceptable learning performance consistently, this paper proposes an adaptive rule for the discount factor based on the advantage function. Additionally, how to use the advantage function in both on-policy and off-policy algorithms is presented. To demonstrate the performance of the proposed adaptive rule, it is applied to PPO (Proximal Policy Optimization) for Tetris in order to validate the on-policy case, and to SAC (Soft Actor-Critic) for the motion planning of a robot manipulator to validate the off-policy case. In both cases, the proposed method results in a better or similar performance compared with cases using the best constant discount factors found by exhaustive search. Hence, the proposed adaptive discount factor automatically finds a discount factor that leads to comparable training performance, and that can be applied to representative deep reinforcement learning problems.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Refuerzo en Psicología / Algoritmos Idioma: En Revista: Sensors (Basel) Año: 2022 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Refuerzo en Psicología / Algoritmos Idioma: En Revista: Sensors (Basel) Año: 2022 Tipo del documento: Article