Off-policy two-dimensional reinforcement learning for optimal tracking control of batch processes with network-induced dropout and disturbances.
ISA Trans
; 144: 228-244, 2024 Jan.
Article
in En
| MEDLINE
| ID: mdl-38030447
In this paper, a new off-policy two-dimensional (2D) reinforcement learning approach is proposed to deal with the optimal tracking control (OTC) issue of batch processes with network-induced dropout and disturbances. A dropout 2D augmented Smith predictor is first devised to estimate the present extended state utilizing past data of time and batch orientations. The dropout 2D value function and Q-function are further defined, and their relation is analyzed to meet the optimal performance. On this basis, the dropout 2D Bellman equation is derived according to the principle of the Q-function. For the sake of addressing the dropout 2D OTC problem of batch processes, two algorithms, i.e., the off-line 2D policy iteration algorithm and the off-policy 2D Q-learning algorithm, are presented. The latter method is developed by applying only the input and the estimated state, not the underlying information of the system. Meanwhile, the analysis with regard to the unbiasedness of solutions and convergence is separately given. The effectiveness of the provided methodologies is eventually validated through the application of a simulated case during the filling process.
Full text:
1
Collection:
01-internacional
Database:
MEDLINE
Language:
En
Journal:
ISA Trans
Year:
2024
Type:
Article
Affiliation country:
China