Pesquisa | BVS Violência e Saúde

Distributional Policy Gradient With Distributional Value Function.

Liu, Qi; Li, Yanjie; Shi, Xiongtao; Lin, Ke; Liu, Yuecheng; Lou, Yunjiang.

IEEE Trans Neural Netw Learn Syst ; PP2024 Apr 26.

Artigo em Inglês | MEDLINE | ID: mdl-38669170

RESUMO

In this article, we propose a distributional policy-gradient method based on distributional reinforcement learning (RL) and policy gradient. Conventional RL algorithms typically estimate the expectation of return, given a state-action pair. Furthermore, distributional RL algorithms consider the return as a random variable and estimate the return distribution that can characterize the probability of different returns resulted by environmental uncertainties. Thus, the return distribution provides more valuable information than its expectation, leading to superior policies in general. Although distributional RL has been investigated widely in value-based RL methods, very few policy-gradient methods take advantage of distributional RL. To bridge this research gap, we propose a distributional policy-gradient method by introducing a distributional value function to the policy gradient (DVDPG). We estimate the distribution of policy gradient instead of the expectation estimated in conventional policy-gradient RL methods. Furthermore, we propose two policy-gradient value sampling mechanisms to do policy improvement. First, we propose a distribution-probability-sampling method that samples the policy-gradient value according to the quantile probability of return distribution. Second, a uniform sample mechanism is proposed. With our sample mechanisms, the proposed distributional policy-gradient method enhances the stochasticity of the policy gradient, improving the exploration efficiency and benefiting to avoid falling into local optimal solutions. In sparse-reward tasks, the distribution-probability-sampling method outperforms the uniform sample mechanism. In dense-reward tasks, the two sample mechanisms perform similarly. Moreover, we show that the conventional policy-gradient method is a special case of the proposed method. Experimental results on various sparse-reward and dense-reward OpenAI-gym tasks illustrate the efficiency of the proposed method, outperforming baselines in almost environments.

Rotating consensus for double-integrator multi-agent systems with communication delay.

Shi, Xiongtao; Li, Yonggang; Yang, Yanhua; Sun, Bei; Li, Yanjie.

ISA Trans ; 128(Pt B): 207-216, 2022 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-34953579

RESUMO

In this paper, the rotating consensus problem for the multi-agent systems of double-integrator dynamic is mainly considered with and without communication delay. A fully distributed control strategy is given on the more general complex plane. For the case without communication delay, we design a distributed control protocol with the help of local relative information and obtain the sufficient and necessary condition with the lower bounded control parameter for the directed communication topology. Furthermore, with taking communication delay into consideration, the sufficient and necessary condition with the upper bounded communication delay is obtained with the help of frequency domain analysis method. Compared with some existing results, the communication topology considered is a more general directed graph. Finally, some simulation examples are presented to clarify the correctness of our results.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA