Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros

Base de dados
Tipo de documento
Assunto da revista
Intervalo de ano de publicação
1.
Entropy (Basel) ; 25(2)2023 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-36832665

RESUMO

This article offers an optimal control tracking method using an event-triggered technique and the internal reinforcement Q-learning (IrQL) algorithm to address the tracking control issue of unknown nonlinear systems with multiple agents (MASs). Relying on the internal reinforcement reward (IRR) formula, a Q-learning function is calculated, and then the iteration IRQL method is developed. In contrast to mechanisms triggered by time, an event-triggered algorithm reduces the rate of transmission and computational load, since the controller may only be upgraded when the predetermined triggering circumstances are met. In addition, in order to implement the suggested system, a neutral reinforce-critic-actor (RCA) network structure is created that may assess the indices of performance and online learning of the event-triggering mechanism. This strategy is intended to be data-driven without having in-depth knowledge of system dynamics. We must develop the event-triggered weight tuning rule, which only modifies the parameters of the actor neutral network (ANN) in response to triggering cases. In addition, a Lyapunov-based convergence study of the reinforce-critic-actor neutral network (NN) is presented. Lastly, an example demonstrates the accessibility and efficiency of the suggested approach.

2.
ISA Trans ; 145: 298-314, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38057173

RESUMO

This paper addresses the optimal tracking control problem of an Autonomous Underwater Vehicle (AUV) using Pontryagin's Minimum Principle (PMP). The formulation uses a choice of Hamiltonian function using the desired control objective with AUV dynamics acting as dynamic constraints. We first develop PMP based on AUV kinematics and then extend it to the dynamics to arrive at optimal thrusts and moments. Necessary conditions for optimality are derived for both the models using PMP, which results in optimal trajectories that simultaneously minimize the tracking error as well as the control cost, thereby arriving at energy optimality. It was observed that the adjoint variables (costates) are indeed the momenta in the inertial and body-fixed frames. At the kinematic level, this forms a stable solution. The developed methodology is applied to both 2D and 3D AUV model with disturbances due to inputs and ocean currents. Numerical simulations are carried out with the derived control laws for a given trajectory tracking target. Quantitative evaluation of the performance and comparison of the controller is done using Mean Square Error(MSE) and Total Variation(TV) measures. The proposed control laws are found to achieve the desired control objectives.

3.
ISA Trans ; 144: 228-244, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38030447

RESUMO

In this paper, a new off-policy two-dimensional (2D) reinforcement learning approach is proposed to deal with the optimal tracking control (OTC) issue of batch processes with network-induced dropout and disturbances. A dropout 2D augmented Smith predictor is first devised to estimate the present extended state utilizing past data of time and batch orientations. The dropout 2D value function and Q-function are further defined, and their relation is analyzed to meet the optimal performance. On this basis, the dropout 2D Bellman equation is derived according to the principle of the Q-function. For the sake of addressing the dropout 2D OTC problem of batch processes, two algorithms, i.e., the off-line 2D policy iteration algorithm and the off-policy 2D Q-learning algorithm, are presented. The latter method is developed by applying only the input and the estimated state, not the underlying information of the system. Meanwhile, the analysis with regard to the unbiasedness of solutions and convergence is separately given. The effectiveness of the provided methodologies is eventually validated through the application of a simulated case during the filling process.

4.
Neural Netw ; 175: 106274, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38583264

RESUMO

In this paper, an adjustable Q-learning scheme is developed to solve the discrete-time nonlinear zero-sum game problem, which can accelerate the convergence rate of the iterative Q-function sequence. First, the monotonicity and convergence of the iterative Q-function sequence are analyzed under some conditions. Moreover, by employing neural networks, the model-free tracking control problem can be overcome for zero-sum games. Second, two practical algorithms are designed to guarantee the convergence with accelerated learning. In one algorithm, an adjustable acceleration phase is added to the iteration process of Q-learning, which can be adaptively terminated with convergence guarantee. In another algorithm, a novel acceleration function is developed, which can adjust the relaxation factor to ensure the convergence. Finally, through a simulation example with the practical physical background, the fantastic performance of the developed algorithm is demonstrated with neural networks.


Assuntos
Algoritmos , Redes Neurais de Computação , Dinâmica não Linear , Simulação por Computador , Humanos , Aprendizado de Máquina
5.
Neural Netw ; 177: 106388, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38776760

RESUMO

This paper investigates the optimal tracking issue for continuous-time (CT) nonlinear asymmetric constrained zero-sum games (ZSGs) by exploiting the neural critic technique. Initially, an improved algorithm is constructed to tackle the tracking control problem of nonlinear CT multiplayer ZSGs. Also, we give a novel nonquadratic function to settle the asymmetric constraints. One thing worth noting is that the method used in this paper to solve asymmetric constraints eliminates the strict restriction on the control matrix compared to the previous ones. Further, the optimal controls, the worst disturbances, and the tracking Hamilton-Jacobi-Isaacs equation are derived. Next, a single critic neural network is built to estimate the optimal cost function, thus obtaining the approximations of the optimal controls and the worst disturbances. The critic network weight is updated by the normalized steepest descent algorithm. Additionally, based on the Lyapunov method, the stability of the tracking error and the weight estimation error of the critic network is analyzed. In the end, two examples are offered to validate the theoretical results.


Assuntos
Algoritmos , Redes Neurais de Computação , Dinâmica não Linear , Teoria dos Jogos , Humanos , Simulação por Computador
6.
ISA Trans ; 148: 1-11, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38429141

RESUMO

In this paper, the robust adaptive optimal tracking control problem is addressed for the disturbed unmanned helicopter based on the time-varying gain extended state observer (TVGESO) and adaptive dynamic programming (ADP) methods. Firstly, a novel TVGESO is developed to tackle the unknown disturbance, which can overcome the drawback of initial peaking phenomenon in the traditional linear ESO method. Meanwhile, compared with the nonlinear ESO, the proposed TVGESO possesses easier and rigorous stability analysis process. Subsequently, the optimal tracking control issue for the original unmanned helicopter system is transformed into an optimization stabilization problem. By means of the ADP and neural network techniques, the feedforward controller and optimal feedback controller are skillfully designed. Compared with the conventional backstepping approach, the designed anti-disturbance optimal controller can make the unmanned helicopter accomplish the tracking task with less energy. Finally, simulation comparisons demonstrate the validity of the developed control scheme.

7.
Biomed Phys Eng Express ; 9(4)2023 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-37348467

RESUMO

The ability to finely manipulate spatiotemporal patterns displayed in neuronal populations is critical for understanding and influencing brain functions, sleep cycles, and neurological pathologies. However, such control tasks are challenged not only by the immense scale but also by the lack of real-time state measurements of neurons in the population, which deteriorates the control performance. In this paper, we formulate the control of dynamic structures in an ensemble of neuron oscillators as a tracking problem and propose a principled control technique for designing optimal stimuli that produce desired spatiotemporal patterns in a network of interacting neurons without requiring feedback information. We further reveal an interesting presentation of information encoding and processing in a neuron ensemble in terms of its controllability property. The performance of the presented technique in creating complex spatiotemporal spiking patterns is demonstrated on neural populations described by mathematically ideal and biophysical models, including the Kuramoto and Hodgkin-Huxley models, as well as real-time experiments on Wein bridge oscillators.


Assuntos
Modelos Neurológicos , Neurônios , Neurônios/fisiologia , Potenciais de Ação/fisiologia , Biofísica , Retroalimentação
8.
Neural Netw ; 164: 105-114, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37148606

RESUMO

In this paper, a novel adaptive critic control method is designed to solve an optimal H∞ tracking control problem for continuous nonlinear systems with nonzero equilibrium based on adaptive dynamic programming (ADP). To guarantee the finiteness of a cost function, traditional methods generally assume that the controlled system has a zero equilibrium point, which is not true in practical systems. In order to overcome such obstacle and realize H∞ optimal tracking control, this paper proposes a novel cost function design with respect to disturbance, tracking error and the derivative of tracking error. Based on the designed cost function, the H∞ control problem is formulated as two-player zero-sum differential games, and then a policy iteration (PI) algorithm is proposed to solve the corresponding Hamilton-Jacobi-Isaacs (HJI) equation. In order to obtain the online solution to the HJI equation, a single-critic neural network structure based on PI algorithm is established to learn the optimal control policy and the worst-case disturbance law. It is worth mentioning that the proposed adaptive critic control method can simplify the controller design process when the equilibrium of the systems is not zero. Finally, simulations are conducted to evaluate the tracking performance of the proposed control methods.


Assuntos
Redes Neurais de Computação , Dinâmica não Linear , Retroalimentação , Algoritmos , Aprendizagem
9.
ISA Trans ; 141: 212-222, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37451921

RESUMO

This paper devotes to solving the optimal tracking control (OTC) problem of singular perturbation systems in industrial processes under the framework of reinforcement learning (RL) technology. The encountered challenges include the different time scales in system operations and an unknown slow process. The immeasurability of slow process states especially increases the difficulty of finding the optimal tracking controller. To overcome these challenges, a novel off-policy ridge RL method is developed after decomposing the singular perturbed systems using the singular perturbation (SP) theory and replacing unmeasured states using important mathematical manipulations. Theoretical analysis of approximate equivalence of the sum of solutions of subproblems to the solution of the OTC problem is presented. Finally, a mixed separation thickening process (MSTP) and a numerical example are used to verify the effectiveness.

10.
ISA Trans ; 125: 10-21, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34130858

RESUMO

In view that the previous control methods usually rely too much on the models of batch process and have difficulty in a practical batch process with unknown dynamics, a novel data-driven two-dimensional (2D) off-policy Q-learning approach for optimal tracking control (OTC) is proposed to make the batch process obtain a model-free control law. Firstly, an extended state space equation composing of the state and output error is established for ensuring tracking performance of the designed controller. Secondly, the behavior policy of generating data and the target policy of optimization as well as learning is introduced based on this extended system. Then, the Bellman equation independent of model parameters is given via analyzing the relation between 2D value function and 2D Q-function. The measured data along the batch and time directions of batch process are just taken to carry out the policy iteration, which can figure out the optimal control problem despite lacking systematic dynamic information. The unbiasedness and convergence of the designed 2D off-policy Q-learning algorithm are proved. Finally, a simulation case for injection molding process manifests that control effect and tracking effect gradually become better with the increasing number of batches.

11.
Neural Netw ; 154: 131-140, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35882081

RESUMO

In this paper, a critic learning structure based on the novel utility function is developed to solve the optimal tracking control problem with the discount factor of affine nonlinear systems. The utility function is defined as the quadratic form of the error at the next moment, which can not only avoid solving the stable control input, but also effectively eliminate the tracking error. Next, the theoretical derivation of the method under value iteration is given in detail with convergence and stability analysis. Then, the dual heuristic dynamic programming (DHP) algorithm via a single neural network is introduced to reduce the amount of computation. The polynomial is used to approximate the costate function during the DHP implementation. The weighted residual method is used to update the weight matrix. During simulation, the convergence speed of the given strategy is compared with the heuristic dynamic programming (HDP) algorithm. The experiment results display that the convergence speed of the proposed method is faster than the HDP algorithm. Besides, the proposed method is compared with the traditional tracking control approach to verify its tracking performance. The experiment results show that the proposed method can avoid solving the stable control input, and the tracking error is closer to zero than the traditional strategy.


Assuntos
Redes Neurais de Computação , Dinâmica não Linear , Algoritmos , Simulação por Computador , Aprendizagem
12.
ISA Trans ; 128(Pt A): 123-132, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34756757

RESUMO

To handle the tracking control problem of the magnetic wheeled mobile robot (MWMR), this paper developed an online robust tracking control scheme by adaptive dynamic programming (ADP). The problem, that how to achieve optimal tracking control of continuous-time (CT) MWMR system with the time-varying unknown uncertainty, can be solved indirectly through matching the optimal tracking control of the associated nominal system . A single critic NN-based actor-critic structure is tailored for simpler controller architecture. By minimizing the Bellman error with gradient descending and least-squares updating laws, the critic NN weights can be optimized online. Thus the optimal cost function and the optimal control signal can be approximated with high precision. Using the Lyapunov stability theorem, the convergence of the critic NN weights, and the stability of the closed-loop system is provided. Simulations, in comparison with robust PD control and adaptive control, are presented to illustrate the effectiveness of the proposed tracking control method for the MWMR.

13.
Neural Netw ; 143: 121-132, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34118779

RESUMO

In this paper, we aim to solve the optimal tracking control problem for a class of nonaffine discrete-time systems with actuator saturation. First, a data-based neural identifier is constructed to learn the unknown system dynamics. Then, according to the expression of the trained neural identifier, we can obtain the steady control corresponding to the reference trajectory. Next, by involving the iterative dual heuristic dynamic programming algorithm, the new costate function and the tracking control law are developed. Two other neural networks are used to estimate the costate function and approximate the tracking control law. Considering approximation errors of neural networks, the stability analysis of the proposed algorithm for the specific systems is provided by introducing the Lyapunov approach. Finally, via conducting simulation and comparison, the superiority of the developed optimal tracking method is confirmed. Moreover, the trajectory tracking performance of the wastewater treatment application is also involved for further verifying the proposed approach.


Assuntos
Dinâmica não Linear , Purificação da Água , Algoritmos , Retroalimentação , Redes Neurais de Computação
14.
ISA Trans ; 98: 251-262, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-31439393

RESUMO

Although the optimal tracking control problem (OTCP) has been addressed recently, only the single-input system is considered in the recent literature. In this paper, the OTCP of unknown multi-motor driven load systems (MMDLS) is addressed based on a simplified reinforcement learning (RL) structure, where all the motor inputs with different dynamics will be obtained as a Nash equilibrium. Thus, the performance indexes associated with each input can be optimized as an outcome of a Nash equilibrium. Firstly, we use an identifier to reconstruct MMDLS dynamics, such that the accurate model required in the general control design is avoided. We use the identified dynamics to drive Nash-optimization inputs, which include the steady-state controls and the RL-based controls. The steady-state controls are designed with the identified system model. The RL-based controls are designed using the optimization method with the simplified RL-based critic NN schemes. We use the simplified RL structures to approximate the cost function of each motor input in the optimal control design. The NN weights of both the identified algorithm and simplified RL-based structure are approximated by using a novel adaptation algorithm, where the learning gains can be optimized adaptively. The weight convergences and the Nash-optimization MMDLS stability are all proved. Finally, numerical MMDLS simulations are implemented to show the correctness and the improved performance of the proposed methods.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA