Novel data-driven two-dimensional Q-learning for optimal tracking control of batch process with unknown dynamics.

Wen, Xin; Shi, Huiyuan; Su, Chengli; Jiang, Xueying; Li, Ping; Yu, Jingxian

Wen, Xin; Shi, Huiyuan; Su, Chengli; Jiang, Xueying; Li, Ping; Yu, Jingxian.

Afiliación

Wen X; School of Information and Control Engineering, Liaoning Petrochemical University, China.
Shi H; School of Information and Control Engineering, Liaoning Petrochemical University, China; School of Automation, Northwestern Polytechnical University, China; State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, China. Electronic address: huiyuan_sh
Su C; School of Information and Control Engineering, Liaoning Petrochemical University, China; School of Electronic and Information Engineering, University of Science and Technology Liaoning, China. Electronic address: sclwind@sina.com.
Jiang X; School of Information Science and Engineering, Northeastern University, China.
Li P; School of Information and Control Engineering, Liaoning Petrochemical University, China; School of Electronic and Information Engineering, University of Science and Technology Liaoning, China.
Yu J; School of Sciences, Liaoning Petrochemical University, China.

ISA Trans ; 125: 10-21, 2022 Jun.

Article en En | MEDLINE | ID: mdl-34130858

RESUMEN

In view that the previous control methods usually rely too much on the models of batch process and have difficulty in a practical batch process with unknown dynamics, a novel data-driven two-dimensional (2D) off-policy Q-learning approach for optimal tracking control (OTC) is proposed to make the batch process obtain a model-free control law. Firstly, an extended state space equation composing of the state and output error is established for ensuring tracking performance of the designed controller. Secondly, the behavior policy of generating data and the target policy of optimization as well as learning is introduced based on this extended system. Then, the Bellman equation independent of model parameters is given via analyzing the relation between 2D value function and 2D Q-function. The measured data along the batch and time directions of batch process are just taken to carry out the policy iteration, which can figure out the optimal control problem despite lacking systematic dynamic information. The unbiasedness and convergence of the designed 2D off-policy Q-learning algorithm are proved. Finally, a simulation case for injection molding process manifests that control effect and tracking effect gradually become better with the increasing number of batches.

Palabras clave

2D off-policy Q-learning; Batch process; Data-driven; Injection molding; Optimal tracking control

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: ISA Trans Año: 2022 Tipo del documento: Article País de afiliación: China Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google