Policy Iteration-Based Learning Design for Linear Continuous-Time Systems Under Initial Stabilizing OPFB Policy.

Zhang, Chengye; Chen, Ci; Lewis, Frank L; Xie, Shengli

Zhang, Chengye; Chen, Ci; Lewis, Frank L; Xie, Shengli.

IEEE Trans Cybern ; PP2024 Jul 22.

Article em En | MEDLINE | ID: mdl-39037879

ABSTRACT

ABSTRACT

Policy iteration (PI), an iterative method in reinforcement learning, has the merit of interactions with a little-known environment to learn a decision law through policy evaluation and improvement. However, the existing PI-based results for output-feedback (OPFB) continuous-time systems relied heavily on an initial stabilizing full state-feedback (FSFB) policy. It thus raises the question of violating the OPFB principle. This article addresses such a question and establishes the PI under an initial stabilizing OPFB policy. We prove that an off-policy Bellman equation can transform any OPFB policy into an FSFB policy. Based on this transformation property, we revise the traditional PI by appending an additional iteration, which turns out to be efficient in approximating the optimal control under the initial OPFB policy. We show the effectiveness of the proposed learning methods through theoretical analysis and a case study.

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: IEEE Trans Cybern Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: IEEE Trans Cybern Ano de publicação: 2024 Tipo de documento: Article