Policy Iteration-Based Learning Design for Linear Continuous-Time Systems Under Initial Stabilizing OPFB Policy.
IEEE Trans Cybern
; PP2024 Jul 22.
Article
en En
| MEDLINE
| ID: mdl-39037879
ABSTRACT
Policy iteration (PI), an iterative method in reinforcement learning, has the merit of interactions with a little-known environment to learn a decision law through policy evaluation and improvement. However, the existing PI-based results for output-feedback (OPFB) continuous-time systems relied heavily on an initial stabilizing full state-feedback (FSFB) policy. It thus raises the question of violating the OPFB principle. This article addresses such a question and establishes the PI under an initial stabilizing OPFB policy. We prove that an off-policy Bellman equation can transform any OPFB policy into an FSFB policy. Based on this transformation property, we revise the traditional PI by appending an additional iteration, which turns out to be efficient in approximating the optimal control under the initial OPFB policy. We show the effectiveness of the proposed learning methods through theoretical analysis and a case study.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Idioma:
En
Revista:
IEEE Trans Cybern
Año:
2024
Tipo del documento:
Article
Pais de publicación:
Estados Unidos