Q-ADER: An Effective Q-Learning for Recommendation With Diminishing Action Space.

Li, Fan; Qu, Hong; Zhang, Liyan; Fu, Mingsheng; Chen, Wenyu; Yi, Zhang

ABSTRACT

Deep reinforcement learning (RL) has been widely applied to personalized recommender systems (PRSs) as they can capture user preferences progressively. Among RL-based techniques, deep Q-network (DQN) stands out as the most popular choice due to its simple update strategy and superior performance. Typically, many recommendation scenarios are accompanied by the diminishing action space setting, where the available action space will gradually decrease to avoid recommending duplicate items. However, existing DQN-based recommender systems inherently grapple with a discrepancy between the fixed full action space inherent in the Q-network and the diminishing available action space during recommendation. This article elucidates how this discrepancy induces an issue termed action diminishing error in the vanilla temporal difference (TD) operator. Due to this discrepancy, standard DQN methods prove impractical for learning accurate value estimates, rendering them ineffective in the context of diminishing action space. To mitigate this issue, we propose the Q-learning-based action diminishing error reduction (Q-ADER) algorithm to modify the value estimate error at each step. In practice, Q-ADER augments the standard TD learning with an error reduction term which is straightforward to implement on top of the existing DQN algorithms. Experiments are conducted on four real-world datasets to verify the effectiveness of our proposed algorithm.