Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Neural Netw Learn Syst ; 35(3): 3302-3311, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37053065

RESUMO

This article presents a data-driven safe reinforcement learning (RL) algorithm for discrete-time nonlinear systems. A data-driven safety certifier is designed to intervene with the actions of the RL agent to ensure both safety and stability of its actions. This is in sharp contrast to existing model-based safety certifiers that can result in convergence to an undesired equilibrium point or conservative interventions that jeopardize the performance of the RL agent. To this end, the proposed method directly learns a robust safety certifier while completely bypassing the identification of the system model. The nonlinear system is modeled using linear parameter varying (LPV) systems with polytopic disturbances. To prevent the requirement for learning an explicit model of the LPV system, data-based λ -contractivity conditions are first provided for the closed-loop system to enforce robust invariance of a prespecified polyhedral safe set and the system's asymptotic stability. These conditions are then leveraged to directly learn a robust data-based gain-scheduling controller by solving a convex program. A significant advantage of the proposed direct safe learning over model-based certifiers is that it completely resolves conflicts between safety and stability requirements while assuring convergence to the desired equilibrium point. Data-based safety certification conditions are then provided using Minkowski functions. They are then used to seemingly integrate the learned backup safe gain-scheduling controller with the RL controller. Finally, we provide a simulation example to verify the effectiveness of the proposed approach.

2.
IEEE Trans Cybern ; 54(2): 797-810, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37256797

RESUMO

In this article, we propose a way to enhance the learning framework for zero-sum games with dynamics evolving in continuous time. In contrast to the conventional centralized actor-critic learning, a novel cooperative finitely excited learning approach is developed to combine the online recorded data with instantaneous data for efficiency. By using an experience replay technique for each agent and distributed interaction amongst agents, we are able to replace the classical persistent excitation condition with an easy-to-check cooperative excitation condition. This approach also guarantees the consensus of the distributed actor-critic learning on the solution to the Hamilton-Jacobi-Isaacs (HJI) equation. It is shown that both the closed-loop stability of the equilibrium point and convergence to the Nash equilibrium can be guaranteed. Simulation results demonstrate the efficacy of this approach compared to previous methods.

3.
IEEE Trans Neural Netw Learn Syst ; 34(8): 4892-4902, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34780339

RESUMO

This article presents a fixed-time (FxT) system identifier for continuous-time nonlinear systems. A novel adaptive update law with discontinuous gradient flows of the identification errors is presented, which leverages concurrent learning (CL) to guarantee the learning of uncertain nonlinear dynamics in a fixed time, as opposed to asymptotic or exponential time. More specifically, the CL approach retrieves a batch of samples stored in a memory, and the update law simultaneously minimizes the identification error for the current stream of samples and past memory samples. Rigorous analyses are provided based on FxT Lyapunov stability to certify FxT convergence to the stable equilibria of the gradient descent flow of the system identification error under easy-to-verify rank conditions. The performance of the proposed method in comparison with the existing methods is illustrated in the simulation results.

4.
IEEE Trans Neural Netw Learn Syst ; 34(2): 635-649, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34379597

RESUMO

This article presents a model-free λ -policy iteration ( λ -PI) for the discrete-time linear quadratic regulation (LQR) problem. To solve the algebraic Riccati equation arising from solving the LQR in an iterative manner, we define two novel matrix operators, named the weighted Bellman operator and the composite Bellman operator. Then, the λ -PI algorithm is first designed as a recursion with the weighted Bellman operator, and its equivalent formulation as a fixed-point iteration with the composite Bellman operator is shown. The contraction and monotonic properties of the composite Bellman operator guarantee the convergence of the λ -PI algorithm. In contrast to the PI algorithm, the λ -PI does not require an admissible initial policy, and the convergence rate outperforms the value iteration (VI) algorithm. Model-free extension of the λ -PI algorithm is developed using the off-policy reinforcement learning technique. It is also shown that the off-policy variants of the λ -PI algorithm are robust against the probing noise. Finally, simulation examples are conducted to validate the efficacy of the λ -PI algorithm.

5.
IEEE Trans Neural Netw Learn Syst ; 33(11): 6183-6193, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33886483

RESUMO

This article presents an iterative data-driven algorithm for solving dynamic multiobjective (MO) optimal control problems arising in control of nonlinear continuous-time systems. It is first shown that the Hamiltonian functional corresponding to each objective can be leveraged to compare the performance of admissible policies. Hamiltonian inequalities are then used for which their satisfaction guarantees satisfying the objectives' aspirations. Relaxed Hamilton-Jacobi-Bellman (HJB) equations in terms of HJB inequalities are then solved in a dynamic constrained MO framework to find Pareto optimal solutions. Relation to satisficing (good enough) decision-making framework is shown. A sum-of-square (SOS)-based iterative algorithm is developed to solve the formulated aspiration-satisfying MO optimization. To obviate the requirement of complete knowledge of the system dynamics, a data-driven satisficing reinforcement learning approach is proposed to solve the SOS optimization problem in real time using only the information of the system trajectories measured during a time interval without having full knowledge of the system dynamics. Finally, two simulation examples are utilized to verify the analytical results of the proposed algorithm.

6.
IEEE Trans Cybern ; 52(12): 13762-13773, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34495864

RESUMO

In this article, we consider an iterative adaptive dynamic programming (ADP) algorithm within the Hamiltonian-driven framework to solve the Hamilton-Jacobi-Bellman (HJB) equation for the infinite-horizon optimal control problem in continuous time for nonlinear systems. First, a novel function, "min-Hamiltonian," is defined to capture the fundamental properties of the classical Hamiltonian. It is shown that both the HJB equation and the policy iteration (PI) algorithm can be formulated in terms of the min-Hamiltonian within the Hamiltonian-driven framework. Moreover, we develop an iterative ADP algorithm that takes into consideration the approximation errors during the policy evaluation step. We then derive a sufficient condition on the iterative value gradient to guarantee closed-loop stability of the equilibrium point as well as convergence to the optimal value. A model-free extension based on an off-policy reinforcement learning (RL) technique is also provided. Finally, numerical results illustrate the efficacy of the proposed framework.

7.
IEEE Trans Cybern ; 52(3): 1936-1946, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-32639933

RESUMO

In this article, we introduce a novel approximate optimal decentralized control scheme for uncertain input-affine nonlinear-interconnected systems. In the proposed scheme, we design a controller and an event-triggering mechanism (ETM) at each subsystem to optimize a local performance index and reduce redundant control updates, respectively. To this end, we formulate a noncooperative dynamic game at every subsystem in which we collectively model the interconnection inputs and the event-triggering error as adversarial players that deteriorate the subsystem performance and model the control policy as the performance optimizer, competing against these adversarial players. To obtain a solution to this game, one has to solve the associated Hamilton-Jacobi-Isaac (HJI) equation, which does not have a closed-form solution even when the subsystem dynamics are accurately known. In this context, we introduce an event-driven off-policy integral reinforcement learning (OIRL) approach to learn an approximate solution to this HJI equation using artificial neural networks (NNs). We then use this NN approximated solution to design the control policy and event-triggering threshold at each subsystem. In the learning framework, we guarantee the Zeno-free behavior of the ETMs at each subsystem using the exploration policies. Finally, we derive sufficient conditions to guarantee uniform ultimate bounded regulation of the controlled system states and demonstrate the efficacy of the proposed framework with numerical examples.


Assuntos
Redes Neurais de Computação , Dinâmica não Linear , Retroalimentação , Aprendizagem , Políticas
8.
IEEE Trans Neural Netw Learn Syst ; 31(12): 5441-5455, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-32054590

RESUMO

In this article, we present an intermittent framework for safe reinforcement learning (RL) algorithms. First, we develop a barrier function-based system transformation to impose state constraints while converting the original problem to an unconstrained optimization problem. Second, based on optimal derived policies, two types of intermittent feedback RL algorithms are presented, namely, a static and a dynamic one. We finally leverage an actor/critic structure to solve the problem online while guaranteeing optimality, stability, and safety. Simulation results show the efficacy of the proposed approach.

9.
IEEE Trans Cybern ; 50(3): 1240-1250, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30908252

RESUMO

Resilient and robust distributed control protocols for multiagent systems under attacks on sensors and actuators are designed. A distributed H∞ control protocol is designed to attenuate the disturbance or attack effects. However, the H∞ controller is too conservative in the presence of attacks. Therefore, it is augmented with a distributed adaptive compensator to mitigate the adverse effects of attacks. The proposed controller can make the synchronization error arbitrarily small in the presence of faulty attacks, and satisfy global L2 -gain performance in the presence of malicious attacks or disturbances. A significant advantage of the proposed method is that it requires no restriction on the number of agents or agents' neighbors under attacks on sensors and/or actuators, and it recovers even compromised agents under attacks on actuators. Simulation examples verify the effectiveness of the proposed method.

10.
IEEE Trans Cybern ; 50(8): 3752-3765, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31478887

RESUMO

This article develops a novel distributed intermittent control framework with the ultimate goal of reducing the communication burden in containment control of multiagent systems communicating via a directed graph. Agents are assumed to be under disturbance and communicate on a directed graph. Both static and dynamic intermittent protocols are proposed. Intermittent H∞ containment control design is considered to attenuate the effect of the disturbance and the game algebraic Riccati equation (GARE) is employed to design the coupling and feedback gains for both static and dynamic intermittent feedback. A novel scheme is then used to unify continuous, static, and dynamic intermittent containment protocols. Finally, simulation results verify the efficacy of the proposed approach.

11.
IEEE Trans Cybern ; 49(11): 3957-3967, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-30130241

RESUMO

An autonomous and resilient controller is proposed for leader-follower multiagent systems under uncertainties and cyber-physical attacks. The leader is assumed nonautonomous with a nonzero control input, which allows changing the team behavior or mission in response to the environmental changes. A resilient learning-based control protocol is presented to find optimal solutions to the synchronization problem in the presence of attacks and system dynamic uncertainties. An observer-based distributed H∞ controller is first designed to prevent propagating the effects of attacks on sensors and actuators throughout the network, as well as to attenuate the effect of these attacks on the compromised agent itself. Nonhomogeneous game algebraic Riccati equations are derived to solve the H∞ optimal synchronization problem and off-policy reinforcement learning (RL) is utilized to learn their solution without requiring any knowledge of the agent's dynamics. A trust-confidence-based distributed control protocol is then proposed to mitigate attacks that hijack the entire node and attacks on communication links. A confidence value is defined for each agent based solely on its local evidence. The proposed resilient RL algorithm employs the confidence value of each agent to indicate the trustworthiness of its own information and broadcast it to its neighbors to put weights on the data they receive from it during and after learning. If the confidence value of an agent is low, it employs a trust mechanism to identify compromised agents and remove the data it receives from them from the learning process. The simulation results are provided to show the effectiveness of the proposed approach.

12.
IEEE Trans Neural Netw Learn Syst ; 29(6): 2042-2062, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29771662

RESUMO

This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.

13.
IEEE Trans Neural Netw Learn Syst ; 29(6): 2139-2153, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29771667

RESUMO

This paper develops optimal control protocols for the distributed output synchronization problem of leader-follower multiagent systems with an active leader. Agents are assumed to be heterogeneous with different dynamics and dimensions. The desired trajectory is assumed to be preplanned and is generated by the leader. Other follower agents autonomously synchronize to the leader by interacting with each other using a communication network. The leader is assumed to be active in the sense that it has a nonzero control input so that it can act independently and update its control to keep the followers away from possible danger. A distributed observer is first designed to estimate the leader's state and generate the reference signal for each follower. Then, the output synchronization of leader-follower systems with an active leader is formulated as a distributed optimal tracking problem, and inhomogeneous algebraic Riccati equations (AREs) are derived to solve it. The resulting distributed optimal control protocols not only minimize the steady-state error but also optimize the transient response of the agents. An off-policy reinforcement learning algorithm is developed to solve the inhomogeneous AREs online in real time and without requiring any knowledge of the agents' dynamics. Finally, two simulation examples are conducted to illustrate the effectiveness of the proposed algorithm.

14.
IEEE Trans Neural Netw Learn Syst ; 28(10): 2434-2445, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-28436891

RESUMO

This paper develops an off-policy reinforcement learning (RL) algorithm to solve optimal synchronization of multiagent systems. This is accomplished by using the framework of graphical games. In contrast to traditional control protocols, which require complete knowledge of agent dynamics, the proposed off-policy RL algorithm is a model-free approach, in that it solves the optimal synchronization problem without knowing any knowledge of the agent dynamics. A prescribed control policy, called behavior policy, is applied to each agent to generate and collect data for learning. An off-policy Bellman equation is derived for each agent to learn the value function for the policy under evaluation, called target policy, and find an improved policy, simultaneously. Actor and critic neural networks along with least-square approach are employed to approximate target control policies and value functions using the data generated by applying prescribed behavior policies. Finally, an off-policy RL algorithm is presented that is implemented in real time and gives the approximate optimal control policy for each agent using only measured data. It is shown that the optimal distributed policies found by the proposed algorithm satisfy the global Nash equilibrium and synchronize all agents to the leader. Simulation results illustrate the effectiveness of the proposed method.

15.
IEEE Trans Cybern ; 46(3): 655-67, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25823055

RESUMO

An intelligent human-robot interaction (HRI) system with adjustable robot behavior is presented. The proposed HRI system assists the human operator to perform a given task with minimum workload demands and optimizes the overall human-robot system performance. Motivated by human factor studies, the presented control structure consists of two control loops. First, a robot-specific neuro-adaptive controller is designed in the inner loop to make the unknown nonlinear robot behave like a prescribed robot impedance model as perceived by a human operator. In contrast to existing neural network and adaptive impedance-based control methods, no information of the task performance or the prescribed robot impedance model parameters is required in the inner loop. Then, a task-specific outer-loop controller is designed to find the optimal parameters of the prescribed robot impedance model to adjust the robot's dynamics to the operator skills and minimize the tracking error. The outer loop includes the human operator, the robot, and the task performance details. The problem of finding the optimal parameters of the prescribed robot impedance model is transformed into a linear quadratic regulator (LQR) problem which minimizes the human effort and optimizes the closed-loop behavior of the HRI system for a given task. To obviate the requirement of the knowledge of the human model, integral reinforcement learning is used to solve the given LQR problem. Simulation results on an x - y table and a robot arm, and experimental implementation results on a PR2 robot confirm the suitability of the proposed method.


Assuntos
Inteligência Artificial , Cibernética/métodos , Robótica/métodos , Simulação por Computador , Humanos , Modelos Neurológicos , Análise e Desempenho de Tarefas
16.
IEEE Trans Cybern ; 46(11): 2401-2410, 2016 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28113995

RESUMO

A model-free off-policy reinforcement learning algorithm is developed to learn the optimal output-feedback (OPFB) solution for linear continuous-time systems. The proposed algorithm has the important feature of being applicable to the design of optimal OPFB controllers for both regulation and tracking problems. To provide a unified framework for both optimal regulation and tracking, a discounted performance function is employed and a discounted algebraic Riccati equation (ARE) is derived which gives the solution to the problem. Conditions on the existence of a solution to the discounted ARE are provided and an upper bound for the discount factor is found to assure the stability of the optimal control solution. To develop an optimal OPFB controller, it is first shown that the system state can be constructed using some limited observations on the system output over a period of the history of the system. A Bellman equation is then developed to evaluate a control policy and find an improved policy simultaneously using only some limited observations on the system output. Then, using this Bellman equation, a model-free Off-policy RL-based OPFB controller is developed without requiring the knowledge of the system state or the system dynamics. It is shown that the proposed OPFB method is more powerful than the static OPFB as it is equivalent to a state-feedback control policy. The proposed method is successfully used to solve a regulation and a tracking problem.

17.
IEEE Trans Neural Netw Learn Syst ; 26(10): 2550-62, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26111401

RESUMO

This paper deals with the design of an H ∞ tracking controller for nonlinear continuous-time systems with completely unknown dynamics. A general bounded L2 -gain tracking problem with a discounted performance function is introduced for the H ∞ tracking. A tracking Hamilton-Jacobi-Isaac (HJI) equation is then developed that gives a Nash equilibrium solution to the associated min-max optimization problem. A rigorous analysis of bounded L2 -gain and stability of the control solution obtained by solving the tracking HJI equation is provided. An upper-bound is found for the discount factor to assure local asymptotic stability of the tracking error dynamics. An off-policy reinforcement learning algorithm is used to learn the solution to the tracking HJI equation online without requiring any knowledge of the system dynamics. Convergence of the proposed algorithm to the solution to the tracking HJI equation is shown. Simulation examples are provided to verify the effectiveness of the proposed method.

18.
IEEE Trans Cybern ; 45(2): 165-76, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24879648

RESUMO

This paper presents a method of Q-learning to solve the discounted linear quadratic regulator (LQR) problem for continuous-time (CT) continuous-state systems. Most available methods in the existing literature for CT systems to solve the LQR problem generally need partial or complete knowledge of the system dynamics. Q-learning is effective for unknown dynamical systems, but has generally been well understood only for discrete-time systems. The contribution of this paper is to present a Q-learning methodology for CT systems which solves the LQR problem without having any knowledge of the system dynamics. A natural and rigorous justified parameterization of the Q-function is given in terms of the state, the control input, and its derivatives. This parameterization allows the implementation of an online Q-learning algorithm for CT systems. The simulation results supporting the theoretical development are also presented.

19.
ISA Trans ; 52(5): 611-21, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23706414

RESUMO

This paper is an effort towards developing an online learning algorithm to find the optimal control solution for continuous-time (CT) systems subject to input constraints. The proposed method is based on the policy iteration (PI) technique which has recently evolved as a major technique for solving optimal control problems. Although a number of online PI algorithms have been developed for CT systems, none of them take into account the input constraints caused by actuator saturation. In practice, however, ignoring these constraints leads to performance degradation or even system instability. In this paper, to deal with the input constraints, a suitable nonquadratic functional is employed to encode the constraints into the optimization formulation. Then, the proposed PI algorithm is implemented on an actor-critic structure to solve the Hamilton-Jacobi-Bellman (HJB) equation associated with this nonquadratic cost functional in an online fashion. That is, two coupled neural network (NN) approximators, namely an actor and a critic are tuned online and simultaneously for approximating the associated HJB solution and computing the optimal control policy. The critic is used to evaluate the cost associated with the current policy, while the actor is used to find an improved policy based on information provided by the critic. Convergence to a close approximation of the HJB solution as well as stability of the proposed feedback control law are shown. Simulation results of the proposed method on a nonlinear CT system illustrate the effectiveness of the proposed approach.

20.
IEEE Trans Neural Netw Learn Syst ; 24(10): 1513-25, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24808590

RESUMO

This paper presents an online policy iteration (PI) algorithm to learn the continuous-time optimal control solution for unknown constrained-input systems. The proposed PI algorithm is implemented on an actor-critic structure where two neural networks (NNs) are tuned online and simultaneously to generate the optimal bounded control policy. The requirement of complete knowledge of the system dynamics is obviated by employing a novel NN identifier in conjunction with the actor and critic NNs. It is shown how the identifier weights estimation error affects the convergence of the critic NN. A novel learning rule is developed to guarantee that the identifier weights converge to small neighborhoods of their ideal values exponentially fast. To provide an easy-to-check persistence of excitation condition, the experience replay technique is used. That is, recorded past experiences are used simultaneously with current data for the adaptation of the identifier weights. Stability of the whole system consisting of the actor, critic, system state, and system identifier is guaranteed while all three networks undergo adaptation. Convergence to a near-optimal control law is also shown. The effectiveness of the proposed method is illustrated with a simulation example.


Assuntos
Algoritmos , Simulação por Computador , Aprendizagem , Modelos Teóricos , Redes Neurais de Computação , Dinâmica não Linear , Inteligência Artificial , Retroalimentação , Processamento de Sinais Assistido por Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA