Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Más filtros




Base de datos
Intervalo de año de publicación
1.
IEEE Trans Neural Netw Learn Syst ; 34(2): 635-649, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-34379597

RESUMEN

This article presents a model-free λ -policy iteration ( λ -PI) for the discrete-time linear quadratic regulation (LQR) problem. To solve the algebraic Riccati equation arising from solving the LQR in an iterative manner, we define two novel matrix operators, named the weighted Bellman operator and the composite Bellman operator. Then, the λ -PI algorithm is first designed as a recursion with the weighted Bellman operator, and its equivalent formulation as a fixed-point iteration with the composite Bellman operator is shown. The contraction and monotonic properties of the composite Bellman operator guarantee the convergence of the λ -PI algorithm. In contrast to the PI algorithm, the λ -PI does not require an admissible initial policy, and the convergence rate outperforms the value iteration (VI) algorithm. Model-free extension of the λ -PI algorithm is developed using the off-policy reinforcement learning technique. It is also shown that the off-policy variants of the λ -PI algorithm are robust against the probing noise. Finally, simulation examples are conducted to validate the efficacy of the λ -PI algorithm.

2.
IEEE Trans Cybern ; 50(7): 3147-3156, 2020 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-30703054

RESUMEN

This paper presents a model-free optimal approach based on reinforcement learning for solving the output regulation problem for discrete-time systems under disturbances. This problem is first broken down into two optimization problems: 1) a constrained static optimization problem is established to find the solution to the output regulator equations (i.e., the feedforward control input) and 2) a dynamic optimization problem is established to find the optimal feedback control input. Solving these optimization problems requires the knowledge of the system dynamics. To obviate this requirement, a model-free off-policy algorithm is presented to find the solution to the dynamic optimization problem using only measured data. Then, based on the solution to the dynamic optimization problem, a model-free approach is provided for the static optimization problem. It is shown that the proposed algorithm is insensitive to the probing noise added to the control input for satisfying the persistence of excitation condition. Simulation results are provided to verify the effectiveness of the proposed approach.

3.
IEEE Trans Cybern ; 50(3): 1240-1250, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-30908252

RESUMEN

Resilient and robust distributed control protocols for multiagent systems under attacks on sensors and actuators are designed. A distributed H∞ control protocol is designed to attenuate the disturbance or attack effects. However, the H∞ controller is too conservative in the presence of attacks. Therefore, it is augmented with a distributed adaptive compensator to mitigate the adverse effects of attacks. The proposed controller can make the synchronization error arbitrarily small in the presence of faulty attacks, and satisfy global L2 -gain performance in the presence of malicious attacks or disturbances. A significant advantage of the proposed method is that it requires no restriction on the number of agents or agents' neighbors under attacks on sensors and/or actuators, and it recovers even compromised agents under attacks on actuators. Simulation examples verify the effectiveness of the proposed method.

4.
IEEE Trans Neural Netw Learn Syst ; 29(6): 2042-2062, 2018 06.
Artículo en Inglés | MEDLINE | ID: mdl-29771662

RESUMEN

This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.

5.
IEEE Trans Cybern ; 48(1): 29-40, 2018 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-27831897

RESUMEN

In this paper, motivated by human neurocognitive experiments, a model-free off-policy reinforcement learning algorithm is developed to solve the optimal tracking control of multiple-model linear discrete-time systems. First, an adaptive self-organizing map neural network is used to determine the system behavior from measured data and to assign a responsibility signal to each of system possible behaviors. A new model is added if a sudden change of system behavior is detected from the measured data and the behavior has not been previously detected. A value function is represented by partially weighted value functions. Then, the off-policy iteration algorithm is generalized to multiple-model learning to find a solution without any knowledge about the system dynamics or reference trajectory dynamics. The off-policy approach helps to increase data efficiency and speed of tuning since a stream of experiences obtained from executing a behavior policy is reused to update several value functions corresponding to different learning policies sequentially. Two numerical examples serve as a demonstration of the off-policy algorithm performance.

6.
IEEE Trans Cybern ; 47(12): 4547-4558, 2017 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-29125464

RESUMEN

Industrial flow lines are composed of unit processes operating on a fast time scale and performance measurements known as operational indices measured at a slower time scale. This paper presents a model-free optimal solution to a class of two time-scale industrial processes using off-policy reinforcement learning (RL). First, the lower-layer unit process control loop with a fast sampling period and the upper-layer operational index dynamics at a slow time scale are modeled. Second, a general optimal operational control problem is formulated to optimally prescribe the set-points for the unit industrial process. Then, a zero-sum game off-policy RL algorithm is developed to find the optimal set-points by using data measured in real-time. Finally, a simulation experiment is employed for an industrial flotation process to show the effectiveness of the proposed method.

7.
IEEE Trans Cybern ; 45(12): 2770-9, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25576591

RESUMEN

In this paper, an output-feedback solution to the infinite-horizon linear quadratic tracking (LQT) problem for unknown discrete-time systems is proposed. An augmented system composed of the system dynamics and the reference trajectory dynamics is constructed. The state of the augmented system is constructed from a limited number of measurements of the past input, output, and reference trajectory in the history of the augmented system. A novel Bellman equation is developed that evaluates the value function related to a fixed policy by using only the input, output, and reference trajectory data from the augmented system. By using approximate dynamic programming, a class of reinforcement learning methods, the LQT problem is solved online without requiring knowledge of the augmented system dynamics only by measuring the input, output, and reference trajectory from the augmented system. We develop both policy iteration (PI) and value iteration (VI) algorithms that converge to an optimal controller that require only measuring the input, output, and reference trajectory data. The convergence of the proposed PI and VI algorithms is shown. A simulation example is used to verify the effectiveness of the proposed control scheme.

8.
IEEE Trans Neural Netw Learn Syst ; 26(1): 140-51, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25312944

RESUMEN

This paper presents a partially model-free adaptive optimal control solution to the deterministic nonlinear discrete-time (DT) tracking control problem in the presence of input constraints. The tracking error dynamics and reference trajectory dynamics are first combined to form an augmented system. Then, a new discounted performance function based on the augmented system is presented for the optimal nonlinear tracking problem. In contrast to the standard solution, which finds the feedforward and feedback terms of the control input separately, the minimization of the proposed discounted performance function gives both feedback and feedforward parts of the control input simultaneously. This enables us to encode the input constraints into the optimization problem using a nonquadratic performance function. The DT tracking Bellman equation and tracking Hamilton-Jacobi-Bellman (HJB) are derived. An actor-critic-based reinforcement learning algorithm is used to learn the solution to the tracking HJB equation online without requiring knowledge of the system drift dynamics. That is, two neural networks (NNs), namely, actor NN and critic NN, are tuned online and simultaneously to generate the optimal bounded control policy. A simulation example is given to show the effectiveness of the proposed method.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA