Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 72
Filtrar
1.
Evol Comput ; 31(4): 433-458, 2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-37155647

RESUMO

Existing work on offline data-driven optimization mainly focuses on problems in static environments, and little attention has been paid to problems in dynamic environments. Offline data-driven optimization in dynamic environments is a challenging problem because the distribution of collected data varies over time, requiring surrogate models and optimal solutions tracking with time. This paper proposes a knowledge-transfer-based data-driven optimization algorithm to address these issues. First, an ensemble learning method is adopted to train surrogate models to leverage the knowledge of data in historical environments as well as adapt to new environments. Specifically, given data in a new environment, a model is constructed with the new data, and the preserved models of historical environments are further trained with the new data. Then, these models are considered to be base learners and combined as an ensemble surrogate model. After that, all base learners and the ensemble surrogate model are simultaneously optimized in a multitask environment for finding optimal solutions for real fitness functions. In this way, the optimization tasks in the previous environments can be used to accelerate the tracking of the optimum in the current environment. Since the ensemble model is the most accurate surrogate, we assign more individuals to the ensemble surrogate than its base learners. Empirical results on six dynamic optimization benchmark problems demonstrate the effectiveness of the proposed algorithm compared with four state-of-the-art offline data-driven optimization algorithms. Code is available at https://github.com/Peacefulyang/DSE_MFS.git.


Assuntos
Algoritmos , Evolução Biológica , Humanos , Bases de Conhecimento , Benchmarking
2.
IEEE Trans Cybern ; 54(8): 4724-4737, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38198260

RESUMO

Aiming at the operation optimization of the wastewater treatment process (WWTP) with nonstationary time-varying dynamics and complex multiconstraint, this article proposes a novel adaptive constraint penalty decomposed multiobjective evolutionary algorithm with synthetical distance (SD)-based cross-generation crossover. First, the concept of spatial SD is presented to comprehensively evaluate the similarity of individual solutions from two aspects of distance and angle, and the individual information between two adjacent generations is used to enhance the diversity of individuals and accelerate the convergence of the algorithm. Second, aiming at the complex multiconstraint during the operation optimization of WWTP, an adaptive penalty algorithm is further adopted to punish the individual solutions that violate the constraints, so as to improve the handling efficiency and success rate of constraints. Furthermore, in view of the time-varying dynamics of actual WWTP, a recursive bilinear subspace identification method based on sliding window is adopted to establish the optimization models as well as the constraint models with self-learning parameter, which provides accurate model guarantee for high-performance multiobjective operation optimization. Finally, the effectiveness, superiority, and practicability of the proposed method are verified through test function experiments as well as operation optimization control experiments of WWTP.

3.
Artigo em Inglês | MEDLINE | ID: mdl-39042548

RESUMO

In mineral processing, the dynamic nature of industrial data poses challenges for decision-makers in accurately assessing current production statuses. To enhance the decision-making process, it is crucial to predict comprehensive production indices (CPIs), which are influenced by both human operators and industrial processes, and demonstrate a strong dual-scale property. To improve the accuracy of CPIs' prediction, we introduce the high-frequency (HF) unit and low-frequency (LF) unit within our proposed dual-scale deep learning (DL) network. This architecture enables the exploration of nonlinear dynamic mapping in dual-scale industrial data. By integrating the Cloud-Edge collaboration mechanism with DL, our training strategy mitigates the dominance of HF data and guides networks to prioritize different frequency information. Through self-tuning training via Cloud-Edge collaboration, the optimal model structure and parameters on the cloud server are adjusted, with the edge model self-updating accordingly. Validated through online industrial experiments, our method significantly enhances CPIs' prediction accuracy compared to the baseline approaches.

4.
IEEE Trans Cybern ; PP2024 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-38776191

RESUMO

This article concerns nonlinear model predictive control (MPC) with guaranteed feasibility of inequality path constraints (PCs). For MPC with PCs, the existing methods, such as direct multiple shooting, cannot guarantee feasibility of PCs because the PCs are enforced at finitely many time points only. Therefore, this article presents a novel MPC framework that is capable of not only achieving stability control but also guaranteeing feasibility of PCs during the rolling optimization stages of MPC. Under the above MPC framework, an algorithm is first proposed by applying the semi-infinite programming technique to the rolling optimization of MPC. However, it takes heavy computational time to achieve guaranteed feasibility of PCs. Therefore, to guarantee feasibility of PCs meanwhile effectively reducing the computation burden of the closed-loop system, an event-triggered sampling mechanism is constructed in the above path-constrained MPC algorithm. Moreover, sufficient conditions are given for asymptotic convergence of the closed-loop systems. Finally, the effectiveness of the proposed results is illustrated via a cart-damper-spring system.

5.
IEEE Trans Neural Netw Learn Syst ; 35(3): 3191-3201, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38379236

RESUMO

In this article, a model-free Q-learning algorithm is proposed to solve the tracking problem of linear discrete-time systems with completely unknown system dynamics. To eliminate tracking errors, a performance index of the Q-learning approach is formulated, which can transform the tracking problem into a regulation one. Compared with the existing adaptive dynamic programming (ADP) methods and Q-learning approaches, the proposed performance index adds a product term composed of a gain matrix and the reference tracking trajectory to the control input quadratic form. In addition, without requiring any prior knowledge of the dynamics of the original controlled system and command generator, the control policy obtained by the proposed approach can be deduced by an iterative technique relying on the online information of the system state, the control input, and the reference tracking trajectory. In each iteration of the proposed method, the desired control input can be updated by the iterative criteria derived from a precondition of the controlled system and the reference tracking trajectory, which ensures that the obtained control policy can eliminate tracking errors in theory. Moreover, to effectively use less data to obtain the optimal control policy, the off-policy approach is introduced into the proposed algorithm. Finally, the effectiveness of the proposed algorithm is verified by a numerical simulation.

6.
IEEE Trans Cybern ; 54(3): 1695-1707, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37027769

RESUMO

This article studies the trajectory imitation control problem of linear systems suffering external disturbances and develops a data-driven static output feedback (OPFB) control-based inverse reinforcement learning (RL) approach. An Expert-Learner structure is considered where the learner aims to imitate expert's trajectory. Using only measured expert's and learner's own input and output data, the learner computes the policy of the expert by reconstructing its unknown value function weights and thus, imitates its optimally operating trajectory. Three static OPFB inverse RL algorithms are proposed. The first algorithm is a model-based scheme and serves as basis. The second algorithm is a data-driven method using input-state data. The third algorithm is a data-driven method using only input-output data. The stability, convergence, optimality, and robustness are well analyzed. Finally, simulation experiments are conducted to verify the proposed algorithms.

7.
IEEE Trans Cybern ; 54(7): 4177-4189, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38602848

RESUMO

Bilevel optimization is a special type of optimization in which one problem is embedded within another. The bilevel optimization problem (BLOP) of which both levels are multiobjective functions is usually called the multiobjective BLOP (MBLOP). The expensive computation and nested features make it challenging to solve. Most existing studies look for complete lower-level solutions for every upper-level variable. However, not every lower-level solution will participate in the bilevel Pareto-optimal front. Under a limited computational budget, instead of wasting resources to find complete lower-level solutions that may not be in the feasible region or inducible region of the MBLOP, it is better to concentrate on finding the solutions with better performance. Bearing these considerations in mind, we propose a multiobjective bilevel optimization solving routine combined with a knee point driven algorithm. Specifically, the proposed algorithm aims to quickly find feasible solutions considering the lower-level constraints in the first stage and then concentrates the computational resources on finding solutions with better performance. Besides, we develop several multiobjective bilevel test problems with different properties, such as scalable, deceptive, convexity, and (dis)continuous. Finally, the performance of the algorithm is validated on a practical petroleum refining bilevel problem, which involves a multiobjective environmental regulation problem and a petroleum refining operational problem. Comprehensive experiments fully demonstrate the effectiveness of our presented algorithm in solving MBLOPs.

8.
IEEE Trans Cybern ; PP2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38630570

RESUMO

This article focuses on distributed nonconvex optimization by exchanging information between agents to minimize the average of local nonconvex cost functions. The communication channel between agents is normally constrained by limited bandwidth, and the gradient information is typically unavailable. To overcome these limitations, we propose a quantized distributed zeroth-order algorithm, which integrates the deterministic gradient estimator, the standard uniform quantizer, and the distributed gradient tracking algorithm. We establish linear convergence to a global optimal point for the proposed algorithm by assuming Polyak- [Formula: see text] ojasiewicz condition for the global cost function and smoothness condition for the local cost functions. Moreover, the proposed algorithm maintains linear convergence at low-data rates with a proper selection of algorithm parameters. Numerical simulations validate the theoretical results.

9.
IEEE Trans Cybern ; PP2024 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-38819970

RESUMO

In this article, the method of dynamic performance monitoring and adaptive self-tuning of parameters for actual PID control systems of industrial processes in virtual reality scenes is proposed. This method combines the digital twin model of the PID control process based on system identification and adaptive deep learning and the PID tuning intelligent algorithm based on reinforcement learning with virtual reality and immersive interaction of industrial metaverse. An industrial metaverse-based intelligent PID tuning system is proposed by combining the above method with the end-edge-cloud collaboration technology of Industrial Internet. The challenging problem that the actual operating PID control system in complex industrial processes cannot be optimized online is solved. Using the energy-intensive equipment, the fused magnesium furnace, as an industrial object, we conducted comparative simulation experiments between the proposed control method and several advanced control methods, as well as industrial experiments for the proposed intelligent system. Simulation experiments demonstrate the effectiveness of the proposed control method. The industrial experimental results indicate that the performance monitoring and adaptive self-tuning of parameters for actual PID control systems of industrial processes in virtual reality scenes can be realized, which achieves excellent control effects.

10.
IEEE Trans Cybern ; 54(7): 3864-3877, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38713573

RESUMO

Efficient monitoring of production performance is crucial for ensuring safe operations and enhancing the economic benefits of the Iron and Steel Corporation. Although basic modeling algorithms and visualization diagrams are available in many scientific platforms and industrial applications, there is still a lack of customized research in production performance monitoring. Therefore, this article proposes an interactive visual analytics approach for monitoring the heavy-plate production process (iHPPPVis). Specifically, a multicategory aggregated monitoring framework is proposed to facilitate production performance monitoring under varying working conditions. In addition, A set of visualizations and interactions are designed to enhance analysts' analysis, identification, and perception of the abnormal production performance in heavy-plate production data. Ultimately, the efficacy and practicality of iHPPPVis are demonstrated through multiple evaluations.

11.
Artigo em Inglês | MEDLINE | ID: mdl-37418409

RESUMO

During the fused magnesia production process (FMPP), there is a demand peak phenomenon that the demand rises first and then falls. Once the demand exceeds its limit value, the power will be cut off. To avoid mistaken power off caused by demand peak, demand peak needs to be forecast, so multistep demand forecasting is required. In this article, we develop a dynamic model of demand based on the closed-loop control system of smelting current in the FMPP. Using the model prediction method, we develop a multistep demand forecasting model consisting of a linear model and an unknown nonlinear dynamic system. Combining system identification with adaptive deep learning, an intelligent forecasting method for furnace group demand peak based on end-edge-cloud collaboration is proposed. It is verified that the proposed forecasting method can accurately forecast demand peak by utilizing industrial big data and end-edge-cloud collaboration technology.

12.
IEEE Trans Cybern ; 53(11): 6896-6909, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-35500080

RESUMO

This article proposes a multiobjective operation optimization method based on reinforcement self-learning and knowledge guidance for quality assurance and consumption reduction of wastewater treatment process (WWTP) with nonstationary time-varying dynamics. First, operation optimization models are developed by online sequential random vector functional-link (OS-RVFL) neural network, which can realize online sequential learning of model parameters. Then, a knowledge base is established to store typical optimization cases for knowledge guiding the subsequent optimizations. Based on it, a reinforcement self-learning-based multiobjective particle swarm optimization (RSL-MOPSO) algorithm is proposed to perform optimization calculation. In this algorithm, reinforcement self-learning is used for interaction learning between environment and action in optimization, and the particle motion trend of algorithm is adjusted according to the feedback information of the optimization process. The effects of wastewater state parameters on particles are recorded and reused to improve the solution quality and calculation efficiency of optimization. Moreover, to make good use of the information of the previous optimizations and balance the coordination between global search in the early stage and local search in the later stage, a selective information feedback mechanism is further proposed to ensure the diversity and convergence of the algorithm. Finally, prediction-based intelligent decision making is performed to select the final optimization solution as the final setpoints for the lower-level controllers from the Pareto frontier with considering specific technical requirements. Data experiments show that the proposed method can effectively reduce energy consumption and ensure effluent quality.

13.
IEEE Trans Neural Netw Learn Syst ; 34(8): 4596-4609, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34623278

RESUMO

This article proposes new inverse reinforcement learning (RL) algorithms to solve our defined Adversarial Apprentice Games for nonlinear learner and expert systems. The games are solved by extracting the unknown cost function of an expert by a learner using demonstrated expert's behaviors. We first develop a model-based inverse RL algorithm that consists of two learning stages: an optimal control learning and a second learning based on inverse optimal control. This algorithm also clarifies the relationships between inverse RL and inverse optimal control. Then, we propose a new model-free integral inverse RL algorithm to reconstruct the unknown expert cost function. The model-free algorithm only needs online demonstration of the expert and learner's trajectory data without knowing system dynamics of either the learner or the expert. These two algorithms are further implemented using neural networks (NNs). In Adversarial Apprentice Games, the learner and the expert are allowed to suffer from different adversarial attacks in the learning process. A two-player zero-sum game is formulated for each of these two agents and is solved as a subproblem for the learner in inverse RL. Furthermore, it is shown that the cost functions that the learner learns to mimic the expert's behavior are stabilizing and not unique. Finally, simulations and comparisons show the effectiveness and the superiority of the proposed algorithms.

14.
IEEE Trans Cybern ; 53(9): 5741-5754, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35560092

RESUMO

This article investigates the sensitivity analysis (SA) of high-dimensional data to identify the effects of process variables on output quantity of interest (QoI) in industrial soft sensor modeling. The computational cost of analyzing the SA of high-dimensional data is high, and models available for SA techniques usually have limited generalization capacity. Therefore, we propose a novel high-dimensional data global SA (GSA) approach based on a deep soft sensor model to address these issues. We first develop an approximately incremental grouping (AIG) algorithm and a region-based cooperative co-evolution (RBCC) algorithm to decompose the high-dimensional data into independent regions for the GSA. Subsequently, a multihead deep soft sensor model with generalization performance is designed to determine the GSA indices of each decomposed region. Specifically, the region of interest (RoI) align algorithm provides the multihead with precisely located decomposed region features. Finally, based on the uncertainty analysis of each model head, we present a joint loss function with the Monte Carlo dropout (MC-dropout) algorithm to measure the GSA indices of each decomposed region on QoIs. Experimental evaluation results on a benchmark dataset and a real-world one demonstrate the effectiveness of the proposed approach in addressing the GSA of high-dimensional data in industrial processes.

15.
IEEE Trans Cybern ; 53(11): 7150-7161, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-35797326

RESUMO

In this article, the event-triggered output regulation problem (EORP) under the denial-of-service (DoS) attacks is considered for networked switched systems (NSSs) with unstable switching dynamics (USDs). The USDs here refer to the unsolvable output regulation of each subsystem and the destabilization at partial switching instants, which indicates that the Lyapunov function does not decrease monotonically in activation intervals of each subsystem and increases at partial switching instants. First, long-duration DoS attacks (LDDAs) are considered, where LDDAs imply that their duration may be longer than the total dwell time (DT) of several adjacent activated subsystems. By imposing constraints at switching instants, consecutive asynchronous subsystem switching caused by LDDAs and USDs is allowed, that is, the subsystem switches several times but the controller switching is blocked by LDDAs and controllers fail to switch correspondingly. Second, mixed event-triggered mechanisms (ETMs), combining event-triggered conditions and periodic sampling conditions, are designed to reduce network burden under LDDAs and improve system performance subject to destabilizing switching. Then, an improved DT for switching signal permits irregular arrangement of destabilizing and stabilizing switching and is more suitable for NSSs subject to LDDAs. Moreover, sufficient conditions ensure the solvability of EORP for NSSs with USDs under LDDAs, network-induced delays, random packet losses, and packet disorders. Finally, a switched RLC circuit shows the feasibility of the proposed method.

16.
IEEE Trans Cybern ; PP2023 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-37713226

RESUMO

Electric-powered wheelchairs play a vital role in ensuring accessibility for individuals with mobility impairments. The design of controllers for tracking tasks must prioritize the safety of wheelchair operation across various scenarios and for a diverse range of users. In this study, we propose a safety-oriented speed tracking control algorithm for wheelchair systems that accounts for external disturbances and uncertain parameters at the dynamic level. We employ a set-membership approach to estimate uncertain parameters online in deterministic sets. Additionally, we present a model predictive control scheme with real-time adaptation of the system model and controller parameters to ensure safety-related constraint satisfaction during the tracking process. This proposed controller effectively guides the wheelchair speed toward the desired reference while maintaining safety constraints. In cases where the reference is inadmissible and violates constraints, the controller can navigate the system to the vicinity of the nearest admissible reference. The efficiency of the proposed control scheme is demonstrated through high-fidelity speed tracking results from two tasks involving both admissible and inadmissible references.

17.
IEEE Trans Neural Netw Learn Syst ; 34(7): 3553-3567, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-34662280

RESUMO

This article develops two novel output feedback (OPFB) Q -learning algorithms, on-policy Q -learning and off-policy Q -learning, to solve H∞ static OPFB control problem of linear discrete-time (DT) systems. The primary contribution of the proposed algorithms lies in a newly developed OPFB control algorithm form for completely unknown systems. Under the premise of satisfying disturbance attenuation conditions, the conditions for the existence of the optimal OPFB solution are given. The convergence of the proposed Q -learning methods, and the difference and equivalence of two algorithms are rigorously proven. Moreover, considering the effects brought by probing noise for the persistence of excitation (PE), the proposed off-policy Q -learning method has the advantage of being immune to probing noise and avoiding biasedness of solution. Simulation results are presented to verify the effectiveness of the proposed approaches.


Assuntos
Redes Neurais de Computação , Dinâmica não Linear , Retroalimentação , Algoritmos , Simulação por Computador
18.
IEEE Trans Neural Netw Learn Syst ; 34(5): 2386-2399, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-34520364

RESUMO

In inverse reinforcement learning (RL), there are two agents. An expert target agent has a performance cost function and exhibits control and state behaviors to a learner. The learner agent does not know the expert's performance cost function but seeks to reconstruct it by observing the expert's behaviors and tries to imitate these behaviors optimally by its own response. In this article, we formulate an imitation problem where the optimal performance intent of a discrete-time (DT) expert target agent is unknown to a DT Learner agent. Using only the observed expert's behavior trajectory, the learner seeks to determine a cost function that yields the same optimal feedback gain as the expert's, and thus, imitates the optimal response of the expert. We develop an inverse RL approach with a new scheme to solve the behavior imitation problem. The approach consists of a cost function update based on an extension of RL policy iteration and inverse optimal control, and a control policy update based on optimal control. Then, under this scheme, we develop an inverse reinforcement Q-learning algorithm, which is an extension of RL Q-learning. This algorithm does not require any knowledge of agent dynamics. Proofs of stability, convergence, and optimality are given. A key property about the nonunique solution is also shown. Finally, simulation experiments are presented to show the effectiveness of the new approach.

19.
Artigo em Inglês | MEDLINE | ID: mdl-37015439

RESUMO

This article is concerned with the fast and accurate trajectory tracking control problem for a sort of underactuated surface vehicle under model uncertainties and environmental disturbances. A novel neural networks (NNs)-based prescribed performance control strategy is proposed to solve the problem. In the control design, a new type of performance function is constructed which provides a way to predefine the settling time and accuracy, straightforward. Then, a pair of barrier functions are employed to combat not only the position error but also the virtual control input. This evades the possible singularity or discontinuity of the control solution. Next, an initialization technique is exploited, removing the requirement for the initial condition of the control system. Finally, two NNs are employed to deal with the unknown ship nonlinearities. The performance analysis not only demonstrates the effectiveness of the proposed approach but also reveals its robustness against disturbances and unknown reference trajectory derivatives. There is, thus, no need to acquire such knowledge or employ specialized tools to handle disturbances. The theoretical findings are illustrated by a simulation study.

20.
IEEE Trans Cybern ; 52(6): 4441-4450, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33141675

RESUMO

In this article, for a class of continuous-time nonlinear nonaffine systems with unknown dynamics, a robust approximate optimal tracking controller (RAOTC) is proposed in the framework of adaptive dynamic programming (ADP). The distinguishing contribution of this article is that a new Lyapunov function is constructed, by using which the derivative information of tracking errors is not required in computing its time derivative along with the solution of the closed-loop system. Thus, the proposed method can make the system states follow nondifferentiable reference signals, which removes the common assumption that the reference signals have to be continuous for tracking control of continuous-time nonlinear systems in the literature. The theoretical analysis, simulation, and application results well illustrate the effectiveness and superiority of the proposed method.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA