Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 150
Filtrar
1.
IEEE Trans Neural Netw Learn Syst ; 35(3): 3191-3201, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38379236

RESUMO

In this article, a model-free Q-learning algorithm is proposed to solve the tracking problem of linear discrete-time systems with completely unknown system dynamics. To eliminate tracking errors, a performance index of the Q-learning approach is formulated, which can transform the tracking problem into a regulation one. Compared with the existing adaptive dynamic programming (ADP) methods and Q-learning approaches, the proposed performance index adds a product term composed of a gain matrix and the reference tracking trajectory to the control input quadratic form. In addition, without requiring any prior knowledge of the dynamics of the original controlled system and command generator, the control policy obtained by the proposed approach can be deduced by an iterative technique relying on the online information of the system state, the control input, and the reference tracking trajectory. In each iteration of the proposed method, the desired control input can be updated by the iterative criteria derived from a precondition of the controlled system and the reference tracking trajectory, which ensures that the obtained control policy can eliminate tracking errors in theory. Moreover, to effectively use less data to obtain the optimal control policy, the off-policy approach is introduced into the proposed algorithm. Finally, the effectiveness of the proposed algorithm is verified by a numerical simulation.

2.
IEEE Trans Cybern ; 54(2): 797-810, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37256797

RESUMO

In this article, we propose a way to enhance the learning framework for zero-sum games with dynamics evolving in continuous time. In contrast to the conventional centralized actor-critic learning, a novel cooperative finitely excited learning approach is developed to combine the online recorded data with instantaneous data for efficiency. By using an experience replay technique for each agent and distributed interaction amongst agents, we are able to replace the classical persistent excitation condition with an easy-to-check cooperative excitation condition. This approach also guarantees the consensus of the distributed actor-critic learning on the solution to the Hamilton-Jacobi-Isaacs (HJI) equation. It is shown that both the closed-loop stability of the equilibrium point and convergence to the Nash equilibrium can be guaranteed. Simulation results demonstrate the efficacy of this approach compared to previous methods.

3.
IEEE Trans Cybern ; 54(3): 1695-1707, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37027769

RESUMO

This article studies the trajectory imitation control problem of linear systems suffering external disturbances and develops a data-driven static output feedback (OPFB) control-based inverse reinforcement learning (RL) approach. An Expert-Learner structure is considered where the learner aims to imitate expert's trajectory. Using only measured expert's and learner's own input and output data, the learner computes the policy of the expert by reconstructing its unknown value function weights and thus, imitates its optimally operating trajectory. Three static OPFB inverse RL algorithms are proposed. The first algorithm is a model-based scheme and serves as basis. The second algorithm is a data-driven method using input-state data. The third algorithm is a data-driven method using only input-output data. The stability, convergence, optimality, and robustness are well analyzed. Finally, simulation experiments are conducted to verify the proposed algorithms.

4.
IEEE Trans Cybern ; 54(3): 1960-1971, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37703146

RESUMO

This article addresses the synchronization tracking problem for high-order uncertain nonlinear multiagent systems via intermittent feedback under a directed graph. By resorting to a novel storer-based triggering transmission strategy in the state channels, we propose an event-triggered neuroadaptive control method with quantitative state feedback that exhibits several salient features: 1) avoiding continuous control updates by making the parameter estimations updated intermittently at the trigger instants; 2) resulting in lower-frequency triggering transmissions by using one event detector to monitor the triggering condition such that each agent only needs to broadcast information at its own trigger times; and 3) saving communication and computation resources by designing the intermittent updating of neural network weights using a dual-phase technique during the triggering period. Besides, it is shown that the proposed scheme is capable of steering the tracking/disagreement errors into an adjustable neighborhood close to the origin, and the existence of a strictly positive dwell time is proved to circumvent Zeno behavior. Both theoretical analysis and numerical simulation authenticate and validate the efficiency of the proposed protocols.

5.
IEEE Trans Cybern ; 54(3): 1391-1402, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37906478

RESUMO

This article proposes a data-efficient model-free reinforcement learning (RL) algorithm using Koopman operators for complex nonlinear systems. A high-dimensional data-driven optimal control of the nonlinear system is developed by lifting it into the linear system model. We use a data-driven model-based RL framework to derive an off-policy Bellman equation. Building upon this equation, we deduce the data-efficient RL algorithm, which does not need a Koopman-built linear system model. This algorithm preserves dynamic information while reducing the required data for optimal control learning. Numerical and theoretical analyses of the Koopman eigenfunctions for dataset truncation are discussed in the proposed model-free data-efficient RL algorithm. We validate our framework on the excitation control of the power system.

6.
IEEE Trans Cybern ; 54(2): 728-738, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38133983

RESUMO

This article addresses the problem of learning the objective function of linear discrete-time systems that use static output-feedback (OPFB) control by designing inverse reinforcement learning (RL) algorithms. Most of the existing inverse RL methods require the availability of states and state-feedback control from the expert or demonstrated system. In contrast, this article considers inverse RL in a more general case where the demonstrated system uses static OPFB control with only input-output measurements available. We first develop a model-based inverse RL algorithm to reconstruct an input-output objective function of a demonstrated discrete-time system using its system dynamics and the OPFB gain. This objective function infers the demonstrations and OPFB gain of the demonstrated system. Then, an input-output Q -function is built for the inverse RL problem upon the state reconstruction technique. Given demonstrated inputs and outputs, a data-driven inverse Q -learning algorithm reconstructs the objective function without the knowledge of the demonstrated system dynamics or the OPFB gain. This algorithm yields unbiased solutions even though exploration noises exist. Convergence properties and the nonunique solution nature of the proposed algorithms are studied. Numerical simulation examples verify the effectiveness of the proposed methods.

7.
IEEE Trans Cybern ; PP2023 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-37819824

RESUMO

In this article, we investigate the distributed tracking control problem for networked uncertain nonlinear strict-feedback systems with unknown time-varying gains under a directed interaction topology. A dual phase performance-guaranteed approach is established. In the first phase, a fully distributed robust filter is constructed for each agent to estimate the desired trajectory with prescribed performance such that the control directions of all agents are allowed to be nonidentical. In the second phase, by establishing a novel lemma regarding Nussbaum function, a new adaptive control protocol is developed for each agent based on backstepping technique, which not only steers the output to track the corresponding estimated signal asymptotically with arbitrarily prescribed transient response but also extends the application scope of the proposed control scheme largely since the unknown control gains are allowed to be time-varying and even state-dependent. In such a way, the underlying problem is tackled with the output tracking error converging into an arbitrarily preassigned residual set exhibiting an arbitrarily predefined convergence rate. Besides, all the internal signals are ensured to be semi-globally ultimately uniformly bounded (SGUUB). Finally, two examples are provided to illustrate the effectiveness of the co-designed scheme.

8.
Artigo em Inglês | MEDLINE | ID: mdl-37463077

RESUMO

This article studies the optimal synchronization of linear heterogeneous multiagent systems (MASs) with partial unknown knowledge of the system dynamics. The object is to realize system synchronization as well as minimize the performance index of each agent. A framework of heterogeneous multiagent graphical games is formulated first. In the graphical games, it is proved that the optimal control policy relying on the solution of the Hamilton-Jacobian-Bellmen (HJB) equation is not only in Nash equilibrium, but also the best response to fixed control policies of its neighbors. To solve the optimal control policy and the minimum value of the performance index, a model-based policy iteration (PI) algorithm is proposed. Then, according to the model-based algorithm, a data-based off-policy integral reinforcement learning (IRL) algorithm is put forward to handle the partially unknown system dynamics. Furthermore, a single-critic neural network (NN) structure is used to implement the data-based algorithm. Based on the data collected by the behavior policy of the data-based off-policy algorithm, the gradient descent method is used to train NNs to approach the ideal weights. In addition, it is proved that all the proposed algorithms are convergent, and the weight-tuning law of the single-critic NNs can promote optimal synchronization. Finally, a numerical example is proposed to show the effectiveness of the theoretical analysis.

9.
NeuroRehabilitation ; 52(3): 425-433, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36806521

RESUMO

BACKGROUND: With the effectiveness of post-hospital brain injury rehabilitation clearly demonstrated, research focus has shifted to durability of treatment gains over time. OBJECTIVE: Study objectives were threefold: (1) examined the stability of outcomes following post-hospital rehabilitation for persons with acquired brain injury, (2) compare differences in short and long-term outcome for TBI and CVA groups, and (3) identify predictors of long-term outcomes. METHODS: Subjects (n = 108) were selected from 2,177 neurologically impaired adults with consecutive discharges from 18 post-hospital programs in 12 states from 2011 through 2019. The study sample included TBI, CVA, and Mixed neurological groups. All persons were evaluated using the Mayo Portland Adaptability Inventory -4 Participation Index at four assessment intervals: admission, discharge, and 3 and 12 month follow-up. Additional analyses included repeated measures 2x4 design addressing TBI and CVA by the four measurement periods, and hierarchical multiple regression to identify outcome predictors. RESULTS: The total sample demonstrated a reduction in Participation T-scores (indicating less disability) from admission to discharge. Reductions in disability were maintained at the 3 and 12 month follow-up assessments (Greenhouse-Geisser F (2.37) = 76.87, p < 0.001, partial eta2 = 0.418, power to detect = 0.99). The CVA group demonstrated greater disability at each assessment interval, however, those differences were not statistically significant. Significant predictors of outcome at 12 months post-discharge were length of stay in program and type of injury. TBIs with longer length of stay experienced better outcome at 12 months than non-TBIs with shorter length of stays (hierarchical multiple regression adjusted R2 = 0.085, p < 0.05). CONCLUSION: Post-hospital residential neurorehabilitation programs provide a return on investment. Gains are realized from admission to discharge, and maintained one year following discharge from rehabilitation.


Assuntos
Assistência ao Convalescente , Lesões Encefálicas Traumáticas , Reabilitação Neurológica , Humanos , Adulto , Lesões Encefálicas , Lesões Encefálicas Traumáticas/reabilitação , Alta do Paciente , Resultado do Tratamento , Hospitais , Masculino , Feminino , Adolescente , Pessoa de Meia-Idade , Idoso
10.
IEEE Trans Neural Netw Learn Syst ; 34(8): 4596-4609, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34623278

RESUMO

This article proposes new inverse reinforcement learning (RL) algorithms to solve our defined Adversarial Apprentice Games for nonlinear learner and expert systems. The games are solved by extracting the unknown cost function of an expert by a learner using demonstrated expert's behaviors. We first develop a model-based inverse RL algorithm that consists of two learning stages: an optimal control learning and a second learning based on inverse optimal control. This algorithm also clarifies the relationships between inverse RL and inverse optimal control. Then, we propose a new model-free integral inverse RL algorithm to reconstruct the unknown expert cost function. The model-free algorithm only needs online demonstration of the expert and learner's trajectory data without knowing system dynamics of either the learner or the expert. These two algorithms are further implemented using neural networks (NNs). In Adversarial Apprentice Games, the learner and the expert are allowed to suffer from different adversarial attacks in the learning process. A two-player zero-sum game is formulated for each of these two agents and is solved as a subproblem for the learner in inverse RL. Furthermore, it is shown that the cost functions that the learner learns to mimic the expert's behavior are stabilizing and not unique. Finally, simulations and comparisons show the effectiveness and the superiority of the proposed algorithms.

11.
IEEE Trans Cybern ; 53(4): 2275-2287, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34623292

RESUMO

This article investigates differential graphical games for linear multiagent systems with a leader on fixed communication graphs. The objective is to make each agent synchronize to the leader and, meanwhile, optimize a performance index, which depends on the control policies of its own and its neighbors. To this end, a distributed adaptive Nash equilibrium solution is proposed for the differential graphical games. This solution, in contrast to the existing ones, is not only Nash but also fully distributed in the sense that each agent only uses local information of its own and its immediate neighbors without using any global information of the communication graph. Moreover, the asymptotic stability and global Nash equilibrium properties are analyzed for the proposed distributed adaptive Nash equilibrium solution. As an illustrative example, the differential graphical game solution is applied to the microgrid secondary control problem to achieve fully distributed voltage synchronization with optimized performance.

12.
IEEE Trans Neural Netw Learn Syst ; 34(7): 3553-3567, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-34662280

RESUMO

This article develops two novel output feedback (OPFB) Q -learning algorithms, on-policy Q -learning and off-policy Q -learning, to solve H∞ static OPFB control problem of linear discrete-time (DT) systems. The primary contribution of the proposed algorithms lies in a newly developed OPFB control algorithm form for completely unknown systems. Under the premise of satisfying disturbance attenuation conditions, the conditions for the existence of the optimal OPFB solution are given. The convergence of the proposed Q -learning methods, and the difference and equivalence of two algorithms are rigorously proven. Moreover, considering the effects brought by probing noise for the persistence of excitation (PE), the proposed off-policy Q -learning method has the advantage of being immune to probing noise and avoiding biasedness of solution. Simulation results are presented to verify the effectiveness of the proposed approaches.


Assuntos
Redes Neurais de Computação , Dinâmica não Linear , Retroalimentação , Algoritmos , Simulação por Computador
13.
IEEE Trans Neural Netw Learn Syst ; 34(5): 2386-2399, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-34520364

RESUMO

In inverse reinforcement learning (RL), there are two agents. An expert target agent has a performance cost function and exhibits control and state behaviors to a learner. The learner agent does not know the expert's performance cost function but seeks to reconstruct it by observing the expert's behaviors and tries to imitate these behaviors optimally by its own response. In this article, we formulate an imitation problem where the optimal performance intent of a discrete-time (DT) expert target agent is unknown to a DT Learner agent. Using only the observed expert's behavior trajectory, the learner seeks to determine a cost function that yields the same optimal feedback gain as the expert's, and thus, imitates the optimal response of the expert. We develop an inverse RL approach with a new scheme to solve the behavior imitation problem. The approach consists of a cost function update based on an extension of RL policy iteration and inverse optimal control, and a control policy update based on optimal control. Then, under this scheme, we develop an inverse reinforcement Q-learning algorithm, which is an extension of RL Q-learning. This algorithm does not require any knowledge of agent dynamics. Proofs of stability, convergence, and optimality are given. A key property about the nonunique solution is also shown. Finally, simulation experiments are presented to show the effectiveness of the new approach.

14.
IEEE Trans Neural Netw Learn Syst ; 34(9): 5354-5365, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35500078

RESUMO

Trajectory planning is one of the indispensable and critical components in robotics and autonomous systems. As an efficient indirect method to deal with the nonlinear system dynamics in trajectory planning tasks over the unconstrained state and control space, the iterative linear quadratic regulator (iLQR) has demonstrated noteworthy outcomes. In this article, a local-learning-enabled constrained iLQR algorithm is herein presented for trajectory planning based on hybrid dynamic optimization and machine learning. Rather importantly, this algorithm attains the key advantage of circumventing the requirement of system identification, and the trajectory planning task is achieved with a simultaneous refinement of the optimal policy and the neural network system in an iterative framework. The neural network can be designed to represent the local system model with a simple architecture, and thus it leads to a sample-efficient training pipeline. In addition, in this learning paradigm, the constraints of the general form that are typically encountered in trajectory planning tasks are preserved. Several illustrative examples on trajectory planning are scheduled as part of the test itinerary to demonstrate the effectiveness and significance of this work.

15.
IEEE Trans Cybern ; 53(4): 2454-2466, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34731084

RESUMO

This article investigates the neuroadaptive optimal fixed-time synchronization and its circuit realization along with dynamical analysis for unidirectionally coupled fractional-order (FO) self-sustained electromechanical seismograph systems under subharmonic and superharmonic oscillations. The synchronization model of the coupled FO seismograph system is established based on drive and response seismic detectors. The dynamical analysis reveals this coupled system generating transient chaos and homoclinic/heteroclinic oscillations. The test results of the constructed equivalent analog circuit further testify its complex nonlinear dynamics. Then, a neuroadaptive optimal fixed-time synchronization controller integrated with the FO hyperbolic tangent tracking differentiator (HTTD), interval type-2 fuzzy neural network (IT2FNN) with transformation, and prescribed performance function (PPF) together with the constraint condition is developed in the backstepping recursive design. Furthermore, it is proved that all signals of this closed-loop system are bounded, and the tracking errors fall into a trap of the prescribed constraint along with the minimized cost function. Extensive studies confirm the effectiveness of the proposed scheme.

16.
IEEE Trans Cybern ; 53(3): 1432-1446, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34570712

RESUMO

This article suggests a collection of model-based and model-free output-feedback optimal solutions to a general H∞ control design criterion of a continuous-time linear system. The goal is to obtain a static output-feedback controller while the design criterion is formulated with an exponential term, divergent or convergent, depending on the designer's choice. Two offline policy-iteration algorithms are presented first, which form the foundations for a family of online off-policy designs. These algorithms cover all different cases of partial or complete model knowledge and provide the designer with a collection of design alternatives. It is shown that such a design for partial model knowledge can reduce the number of unknown matrices to be solved online. In particular, if the disturbance input matrix of the model is given, off-policy learning can be done with no disturbance excitation. This alternative is useful in situations where a measurable disturbance is not available in the learning phase. The utility of these design procedures is demonstrated for the case of an optimal lane tracking controller of an automated car.

17.
IEEE Trans Cybern ; 53(7): 4555-4566, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-36264741

RESUMO

This article considers autonomous systems whose behaviors seek to optimize an objective function. This goes beyond standard applications of condition-based maintenance, which seeks to detect faults or failures in nonoptimizing systems. Normal agents optimize a known accepted objective function, whereas abnormal or misbehaving agents may optimize a renegade objective that does not conform to the accepted one. We provide a unified framework for anomaly detection and correction in optimizing autonomous systems described by differential equations using inverse reinforcement learning (RL). We first define several types of anomalies and false alarms, including noise anomaly, objective function anomaly, intention (control gain) anomaly, abnormal behaviors, noise-anomaly false alarms, and objective false alarms. We then propose model-free inverse RL algorithms to reconstruct the objective functions and intentions for given system behaviors. The inverse RL procedure for anomaly detection and correction has the training phase, detection phase, and correction phase. First, inverse RL in the training phase infers the objective function and intention of the normal behavior system using offline stored data. Second, in the detection phase, inverse RL infers the objective function and intention for online observed test system behaviors using online observation data. They are then compared with that of the nominal system to identify anomalies. Third, correction is executed for the anomalous system to learn the normal objective and intention. Simulations and experiments on a quadrotor unmanned aerial vehicle (UAV) verify the proposed methods.


Assuntos
Aprendizagem , Reforço Psicológico , Algoritmos
18.
J Orthop ; 34: 339-343, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36210958

RESUMO

Introduction: As its indications expand, reverse total shoulder arthroplasty (rTSA) utilization continues to increase. Though relatively uncommon, instability following rTSA can be associated with significant morbidity and need for subsequent revision and treatment. This case control study aims to characterize factors leading to instability after rTSA, especially in those with no previous shoulder surgery. Methods: 194 rTSAs performed within the study period with appropriate operative indications and follow-up were included. Risk factors used in analysis included age, gender, BMI, ASA class, Charlson comorbidity index (CCI), glenosphere, tray, and liner size. Data was analyzed using a hierarchical binary logistical regression to create a predictive model for instability. Results: Seven patients sustained a post-operative dislocation. Mean time to dislocation was 60.4 weeks. Five required open reduction with placement of either a larger humeral tray or polyethylene spacer. One required open reduction with osteophyte removal, and one was converted to a resection arthroplasty. Dislocators were more likely to have a larger BMI (p = 0.002), higher ASA classification (p = 0.09), and larger liner size (p = 0.01). Conclusion: This study demonstrates a large series of patients successfully treated with rTSA. Dislocations were an uncommon complication, but were clearly associated with higher patient BMI, ASA classification, and increased liner size.

19.
Artigo em Inglês | MEDLINE | ID: mdl-36315539

RESUMO

This article studies a distributed minmax strategy for multiplayer games and develops reinforcement learning (RL) algorithms to solve it. The proposed minmax strategy is distributed, in the sense that it finds each player's optimal control policy without knowing all the other players' policies. Each player obtains its distributed control policy by solving a distributed algebraic Riccati equation in a multiplayer noncooperative game. This policy is found against the worst policies of all the other players. We guarantee the existence of distributed minmax solutions and study their L2 and asymptotic stabilities. Under mild conditions, the resulting minmax control policies are shown to improve robust gain and phase margins of multiplayer systems compared to the standard linear-quadratic regulator controller. Distributed minmax solutions are found using both model-based policy iteration and data-driven off-policy RL algorithms. Simulation examples verify the proposed formulation and its computational efficiency over the nondistributed Nash solutions.

20.
Chem Commun (Camb) ; 58(76): 10667-10670, 2022 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-36063119

RESUMO

A new general synthetic route to selective actinide extracting ligands for spent nuclear fuel reprocessing has been established. The amide-functionalized ligands separate Am(III) and Cm(III) from the lanthanides with high selectivities and show rapid rates of metal extraction. The ligands retain the advantages of the analogous unfunctionalized ligands derived from camphorquinone, whilst also negating their main drawback; precipitate formation when in contact with nitric acid. These studies could enable the design of improved solvent extraction processes for closing the nuclear fuel cycle.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...