Off-policy two-dimensional reinforcement learning for optimal tracking control of batch processes with network-induced dropout and disturbances.

Jiang, Xueying; Huang, Min; Shi, Huiyuan; Wang, Xingwei; Zhang, Yanfeng

Jiang, Xueying; Huang, Min; Shi, Huiyuan; Wang, Xingwei; Zhang, Yanfeng.

Affiliation

Jiang X; College of Information Science and Engineering, Northeastern University, China; State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, China.
Huang M; College of Information Science and Engineering, Northeastern University, China; State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, China. Electronic address: mhuang@mail.neu.edu.cn.
Shi H; State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, China; School of Information and Control Engineering, Liaoning Petrochemical University, China.
Wang X; College of Computer Science and Engineering, Northeastern University, China.
Zhang Y; College of Computer Science and Engineering, Northeastern University, China.

ISA Trans ; 144: 228-244, 2024 Jan.

Article in En | MEDLINE | ID: mdl-38030447

ABSTRACT

In this paper, a new off-policy two-dimensional (2D) reinforcement learning approach is proposed to deal with the optimal tracking control (OTC) issue of batch processes with network-induced dropout and disturbances. A dropout 2D augmented Smith predictor is first devised to estimate the present extended state utilizing past data of time and batch orientations. The dropout 2D value function and Q-function are further defined, and their relation is analyzed to meet the optimal performance. On this basis, the dropout 2D Bellman equation is derived according to the principle of the Q-function. For the sake of addressing the dropout 2D OTC problem of batch processes, two algorithms, i.e., the off-line 2D policy iteration algorithm and the off-policy 2D Q-learning algorithm, are presented. The latter method is developed by applying only the input and the estimated state, not the underlying information of the system. Meanwhile, the analysis with regard to the unbiasedness of solutions and convergence is separately given. The effectiveness of the provided methodologies is eventually validated through the application of a simulated case during the filling process.

Key words

Batch processes; Injection velocity; Network-induced dropout; Optimal tracking control; Reinforcement learning; Two-dimensional (2D)

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: ISA Trans Year: 2024 Type: Article Affiliation country: China

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: ISA Trans Year: 2024 Type: Article Affiliation country: China