Variance Reduced Domain Randomization for Reinforcement Learning With Policy Gradient.

Jiang, Yuankun; Li, Chenglin; Dai, Wenrui; Zou, Junni; Xiong, Hongkai

Jiang, Yuankun; Li, Chenglin; Dai, Wenrui; Zou, Junni; Xiong, Hongkai.

IEEE Trans Pattern Anal Mach Intell ; 46(2): 1031-1048, 2024 Feb.

Article in En | MEDLINE | ID: mdl-37930910

ABSTRACT

ABSTRACT

By introducing randomness on the environments, domain randomization (DR) imposes diversity to the policy training of deep reinforcement learning, and thus improves its capability of generalization. The randomization of environments, however, introduces another source of variability for the estimate of policy gradients, in addition to the already high variance incurred by trajectory sampling. Therefore, with standard state-dependent baselines, the policy gradient methods may still suffer high variance, causing a low sample efficiency during the training of DR. In this paper, we theoretically derive a bias-free and state/environment-dependent optimal baseline for DR, and analytically show its ability to achieve further variance reduction over the standard constant and state-dependent baselines for DR. Based on our theory, we further propose a variance reduced domain randomization (VRDR) approach for policy gradient methods, to strike a tradeoff between the variance reduction and computational complexity for the practical implementation. By dividing the entire space of environments into some subspaces and then estimating the state/subspace-dependent baseline, VRDR enjoys a theoretical guarantee of variance reduction and faster convergence than the state-dependent baselines. Empirical evaluations on six robot control tasks with randomized dynamics demonstrate that VRDR not only accelerates the convergence of policy training, but can consistently achieve a better eventual policy with improved training stability.

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: IEEE Trans Pattern Anal Mach Intell Journal subject: INFORMATICA MEDICA Year: 2024 Document type: Article Publication country: EEUU / ESTADOS UNIDOS / ESTADOS UNIDOS DA AMERICA / EUA / UNITED STATES / UNITED STATES OF AMERICA / US / USA

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google