Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-35877793

RESUMEN

In this article, we solve a class of mixed zero-sum game with unknown dynamic information of nonlinear system. A policy iterative algorithm that adopts integral reinforcement learning (IRL), which does not depend on system information, is proposed to obtain the optimal control of competitor and collaborators. An adaptive update law that combines critic-actor structure with experience replay is proposed. The actor function not only approximates optimal control of every player but also estimates auxiliary control, which does not participate in the actual control process and only exists in theory. The parameters of the actor-critic structure are simultaneously updated. Then, it is proven that the parameter errors of the polynomial approximation are uniformly ultimately bounded. Finally, the effectiveness of the proposed algorithm is verified by two given simulations.

2.
Neural Netw ; 154: 1-12, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-35839533

RESUMEN

The distributed optimized dynamic event-triggered controller is investigated for completely unknown heterogeneous nonlinear multi-agent systems (MASs) on a directed graph subject to input-constrained. First, the distributed observer is designed to estimate the information of the leader for each follower, and a network of the augmented system is constructed by employing the dynamics of the followers and the observers. An identifier with a compensator is designed to approximate the unknown augmented system (agent) with an arbitrarily small identifier error. Then, consider that the input-constrained optimal controller, along with Hamilton-Jacobi-Bellman (HJB) equation, is under pressure to execute in certain systems associated with bottlenecks such as communication and computing burdens. A critic-actor-based optimized dynamic event-triggered controller, which tunes the parameters of critic-actor neural networks (NNs) by the dynamic triggering mechanism, is leveraged to determine the rule of aperiodic sampling and maintain the desired synchronization service. In addition, the existence of a positive minimum inter-event time (MIET) between consecutive events is also proved. Finally, the applications in non-identical nonlinear MAS and 2-DOF robots illustrate the availability of the proposed theoretical results.

3.
IEEE Trans Neural Netw Learn Syst ; 33(2): 879-892, 2022 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-33108297

RESUMEN

In this article, an online adaptive optimal control algorithm based on adaptive dynamic programming is developed to solve the multiplayer nonzero-sum game (MP-NZSG) for discrete-time unknown nonlinear systems. First, a model-free coupled globalized dual-heuristic dynamic programming (GDHP) structure is designed to solve the MP-NZSG problem, in which there is no model network or identifier. Second, in order to relax the requirement of systems dynamics, an online adaptive learning algorithm is developed to solve the Hamilton-Jacobi equation using the system states of two adjacent time steps. Third, a series of critic networks and action networks are used to approximate value functions and optimal policies for all players. All the neural network (NN) weights are updated online based on real-time system states. Fourth, the uniformly ultimate boundedness analysis of the NN approximation errors is proved based on the Lyapunov approach. Finally, simulation results are given to demonstrate the effectiveness of the developed scheme.

4.
IEEE Trans Cybern ; 51(6): 2929-2943, 2021 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-31902792

RESUMEN

In this article, off-policy reinforcement learning (RL) algorithm is established to solve the discrete-time N -player nonzero-sum (NZS) games with completely unknown dynamics. The N -coupled generalized algebraic Riccati equations (GARE) are derived, and then policy iteration (PI) algorithm is used to obtain the N -tuple of iterative control and iterative value function. As the system dynamics is necessary in PI algorithm, off-policy RL method is developed for discrete-time N -player NZS games. The off-policy N -coupled Hamilton-Jacobi (HJ) equation is derived based on quadratic value functions. According to the Kronecker product, the N -coupled HJ equation is decomposed into unknown parameter part and the system operation data part, which makes the N -coupled HJ equation solved independent of system dynamics. The least square is used to calculate the iterative value function and N -tuple of iterative control. The existence of Nash equilibrium is proved. The result of the proposed method for discrete-time unknown dynamics NZS games is indicated by the simulation examples.

5.
IEEE Trans Cybern ; 50(10): 4293-4306, 2020 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-30990209

RESUMEN

In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal impulsive control problems for infinite horizon discrete-time nonlinear systems. Considering the constraint of the impulsive interval, in each iteration, the iterative impulsive value function under each possible impulsive interval is obtained, and then the iterative value function and iterative control law are achieved. A new convergence analysis method is developed which proves an iterative value function to converge to the optimum as the iteration index increases to infinity. The properties of the iterative control law are analyzed, and the detailed implementation of the optimal impulsive control law is presented. Finally, two simulation examples with comparisons are given to show the effectiveness of the developed method.

6.
IEEE Trans Neural Netw Learn Syst ; 31(10): 4185-4195, 2020 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-31831451

RESUMEN

The optimal solution to the leader-follower bipartite output synchronization problem is proposed for heterogeneous multiagent systems (MASs) over signed digraphs in the presence of adversarial inputs in this article. For the MASs, the dynamics and dimensions of the followers are different. Distributed observers are first designed to estimate the leader's two-way state and output over signed digraphs. Then, the leader-follower bipartite output synchronization problem on signed graphs is translated into a conventional output distributed leader-follower problem over nonnegative graphs after the state transformation by using the information of followers and observers. The effect of adversarial inputs in sensors or actuators of agents is mitigated by designing the resilient H∞ controller. A data-based reinforcement learning (RL) algorithm is proposed to obtain the optimal control law, which implies that the dynamics of the followers is not required. Finally, a simulation example is given to verify the effectiveness of the proposed algorithm.

7.
IEEE Trans Neural Netw Learn Syst ; 29(4): 1226-1238, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-28362617

RESUMEN

In this paper, a generalized policy iteration (GPI) algorithm with approximation errors is developed for solving infinite horizon optimal control problems for nonlinear systems. The developed stable GPI algorithm provides a general structure of discrete-time iterative adaptive dynamic programming algorithms, by which most of the discrete-time reinforcement learning algorithms can be described using the GPI structure. It is for the first time that approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm with approximation errors are analyzed. The admissibility of the approximate iterative control law can be guaranteed if the approximation errors satisfy the admissibility criteria. The convergence of the developed algorithm is established, which shows that the iterative value function is convergent to a finite neighborhood of the optimal performance index function, if the approximate errors satisfy the convergence criterion. Finally, numerical examples and comparisons are presented.

8.
IEEE Trans Neural Netw Learn Syst ; 29(4): 957-969, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-28141530

RESUMEN

In this paper, a novel adaptive dynamic programming (ADP) algorithm, called "iterative zero-sum ADP algorithm," is developed to solve infinite-horizon discrete-time two-player zero-sum games of nonlinear systems. The present iterative zero-sum ADP algorithm permits arbitrary positive semidefinite functions to initialize the upper and lower iterations. A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the upper and lower optimums, respectively. When the saddle-point equilibrium exists, it is emphasized that both the upper and lower iterative value functions are proved to converge to the optimal solution of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required. If the saddle-point equilibrium does not exist, the upper and lower optimal performance index functions are obtained, respectively, where the upper and lower performance index functions are proved to be not equivalent. Finally, simulation results and comparisons are shown to illustrate the performance of the present method.

9.
IEEE Trans Cybern ; 47(5): 1224-1237, 2017 May.
Artículo en Inglés | MEDLINE | ID: mdl-27093714

RESUMEN

In this paper, a novel discrete-time deterministic Q -learning algorithm is developed. In each iteration of the developed Q -learning algorithm, the iterative Q function is updated for all the state and control spaces, instead of updating for a single state and a single control in traditional Q -learning algorithm. A new convergence criterion is established to guarantee that the iterative Q function converges to the optimum, where the convergence criterion of the learning rates for traditional Q -learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative Q function are analyzed to obtain the convergence criterion, instead of analyzing the iterative Q function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic Q -learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative Q function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic Q -learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.

10.
IEEE Trans Neural Netw Learn Syst ; 28(3): 704-713, 2017 03.
Artículo en Inglés | MEDLINE | ID: mdl-27448374

RESUMEN

This paper establishes an off-policy integral reinforcement learning (IRL) method to solve nonlinear continuous-time (CT) nonzero-sum (NZS) games with unknown system dynamics. The IRL algorithm is presented to obtain the iterative control and off-policy learning is used to allow the dynamics to be completely unknown. Off-policy IRL is designed to do policy evaluation and policy improvement in the policy iteration algorithm. Critic and action networks are used to obtain the performance index and control for each player. The gradient descent algorithm makes the update of critic and action weights simultaneously. The convergence analysis of the weights is given. The asymptotic stability of the closed-loop system and the existence of Nash equilibrium are proved. The simulation study demonstrates the effectiveness of the developed method for nonlinear CT NZS games with unknown system dynamics.

11.
IEEE Trans Cybern ; 47(10): 3367-3379, 2017 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-27448382

RESUMEN

In this paper, a discrete-time optimal control scheme is developed via a novel local policy iteration adaptive dynamic programming algorithm. In the discrete-time local policy iteration algorithm, the iterative value function and iterative control law can be updated in a subset of the state space, where the computational burden is relaxed compared with the traditional policy iteration algorithm. Convergence properties of the local policy iteration algorithm are presented to show that the iterative value function is monotonically nonincreasing and converges to the optimum under some mild conditions. The admissibility of the iterative control law is proven, which shows that the control system can be stabilized under any of the iterative control laws, even if the iterative control law is updated in a subset of the state space. Finally, two simulation examples are given to illustrate the performance of the developed method.

12.
IEEE Trans Neural Netw Learn Syst ; 27(2): 444-58, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26292346

RESUMEN

This paper is concerned with a new data-driven zero-sum neuro-optimal control problem for continuous-time unknown nonlinear systems with disturbance. According to the input-output data of the nonlinear system, an effective recurrent neural network is introduced to reconstruct the dynamics of the nonlinear system. Considering the system disturbance as a control input, a two-player zero-sum optimal control problem is established. Adaptive dynamic programming (ADP) is developed to obtain the optimal control under the worst case of the disturbance. Three single-layer neural networks, including one critic and two action networks, are employed to approximate the performance index function, the optimal control law, and the disturbance, respectively, for facilitating the implementation of the ADP method. Convergence properties of the ADP method are developed to show that the system state will converge to a finite neighborhood of the equilibrium. The weight matrices of the critic and the two action networks are also convergent to finite neighborhoods of their optimal ones. Finally, the simulation results will show the effectiveness of the developed data-driven ADP methods.

13.
IEEE Trans Cybern ; 46(5): 1041-50, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-25935054

RESUMEN

An optimal control method is developed for unknown continuous-time systems with unknown disturbances in this paper. The integral reinforcement learning (IRL) algorithm is presented to obtain the iterative control. Off-policy learning is used to allow the dynamics to be completely unknown. Neural networks are used to construct critic and action networks. It is shown that if there are unknown disturbances, off-policy IRL may not converge or may be biased. For reducing the influence of unknown disturbances, a disturbances compensation controller is added. It is proven that the weight errors are uniformly ultimately bounded based on Lyapunov techniques. Convergence of the Hamiltonian function is also proven. The simulation study demonstrates the effectiveness of the proposed optimal control method for unknown systems with disturbances.

14.
IEEE Trans Neural Netw Learn Syst ; 26(4): 851-65, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25730830

RESUMEN

In industrial process control, there may be multiple performance objectives, depending on salient features of the input-output data. Aiming at this situation, this paper proposes multiple actor-critic structures to obtain the optimal control via input-output data for unknown nonlinear systems. The shunting inhibitory artificial neural network (SIANN) is used to classify the input-output data into one of several categories. Different performance measure functions may be defined for disparate categories. The approximate dynamic programming algorithm, which contains model module, critic network, and action network, is used to establish the optimal control in each category. A recurrent neural network (RNN) model is used to reconstruct the unknown system dynamics using input-output data. NNs are used to approximate the critic and action networks, respectively. It is proven that the model error and the closed unknown system are uniformly ultimately bounded. Simulation results demonstrate the performance of the proposed optimal control scheme for the unknown nonlinear system.


Asunto(s)
Algoritmos , Cognición/fisiología , Redes Neurales de la Computación , Dinámicas no Lineales , Procesamiento de Señales Asistido por Computador , Simulación por Computador , Humanos , Factores de Tiempo
15.
IEEE Trans Neural Netw ; 22(12): 1851-62, 2011 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-22057063

RESUMEN

In this paper, a novel heuristic dynamic programming (HDP) iteration algorithm is proposed to solve the optimal tracking control problem for a class of nonlinear discrete-time systems with time delays. The novel algorithm contains state updating, control policy iteration, and performance index iteration. To get the optimal states, the states are also updated. Furthermore, the "backward iteration" is applied to state updating. Two neural networks are used to approximate the performance index function and compute the optimal control policy for facilitating the implementation of HDP iteration algorithm. At last, we present two examples to demonstrate the effectiveness of the proposed HDP iteration algorithm.


Asunto(s)
Algoritmos , Inteligencia Artificial , Dinámicas no Lineales , Reconocimiento de Normas Patrones Automatizadas/métodos , Programación Lineal , Procesamiento de Señales Asistido por Computador , Simulación por Computador , Retroalimentación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...