Nearly-optimal Explicit MPC-based Reference Governors with Long Prediction Horizons Generated with Machine Learning

The paper shows a procedure for constructing an approximated explicit form of the MPC-based reference governor. MPC-based reference governors are often setup up with long prediction horizons with a significant number of constraints, which forbids using conventional parametric optimisation to obtain the explicit solution. This paper explores the approach of mimicking the behaviour of the MPC-based reference governor with a neural network. The paper shows methods that ensure point-wise satisfaction of process constraints during neural network training. A demonstration using a well-known MIMO process is offered to evaluate control performance.


I. INTRODUCTION
Reference governors based on model predictive control (MPCRG) play a vital role in almost every industry. In fact, the origins of the model predictive control stem from such a setup, where the dynamic matrix controllers are used to modulate a setpoint for local controllers, which are responsible for a single and particular process [1], [2]. MPCRG strategies are usually cast as a very complex optimisation problem, not only due to covering the dynamics of individual closed-loops with local controllers [3] but also in terms of lengths of prediction horizons.
Even though the MPCRG strategies, as advanced supervisory controllers, are often deployed on platforms with sufficient computational power, many applications still exist where a supervisory controller must run in very tight sampling intervals [4]. Such a requirement calls for constructing an explicit form of the MPC-based reference governor, which is not possible by traditional means via the parametric programming approach [5], [6]. On the other hand, machine learning allows us to synthesise a control law in the form of a closed explicit function, which can be evaluated even faster than convectional explicit MPCs, since we will not require to perform a pointlocation problem.
This paper exploits the benefits of imitation learning, where the coefficients of a neural network of the given structure are trained in order to mimic the behaviour of the MPC strategy. Several works have been published in this domain, we reference especially [7], [8]. However, only a few of the current literature focuses on the applicability of the trained neural network on a platform with limited hardware resources. Our approach expands the idea of [9], where the neural network was used to imitate a mixed-integer MPC strategy but fails to provide constraint satisfaction. Furthermore, our approach allows us to situate the explicit version of the MPCbased reference governor strategy on PLC-like devices, which will significantly open the possibilities of improving control performances of existing closed-loop without an invasive upgrade.
The novelties of this paper are summarised as follows: • synthesis of the MPC-based reference governor with long prediction horizon in an explicit in the form of a neural network, • extension of the training method to account for system constraints on manipulated and process variables, • demonstration of such control scheme on a multiple-input multiple-output controlled process with local PI controllers. This paper briefly explores the basics of MPC-based reference governors and then presents the procedure to train the constrained-aware neural networks, which yields a nearlyoptimal explicit control law that gives an almost identical performance as the long-horizon MPC-based reference governor.

II. MPC-BASED REFERENCE GOVERNORS
The main objective of the MPC-based reference governors (MPCRG) is to modulate user-define reference r(t) in such a way that • the state, input, and output constraints are enforced, • overall quality of control performance is increased compared to the original strategy. Then, the MPCRG provides an optimal reference w ⋆ 0 as the setpoint for local controllers. Without loss of generality, we will consider the local controllers as PI controllers for the rest of the paper. The reference governor control strategy for our case is visualised in Fig. 1.  The MPC-based reference governors adopt a classical model predictive controller with quadratic cost function and linearised model of the controlled process. For our particular case, where the reference governor handles setpoints for a set of PID controllers, the following formulation is considered and is inspired by [3,Ch. 4.3], namely min w0,...,wN−1 where N denotes the prediction horizon in a quadratic cost function, while we drive predicted outputs y k to the userdefined reference r k and we penalise the fluctuations of actual control actions ∆u k together with changes in the calculated reference w k . Constraints (1b)-(1h) are enforced for k ∈ N N −1 0 , while the entire optimal control problem is initialised as in (1i), where w(t − T s ) denotes the value of calculated setpoint in previous sampling instant (same holds for u(t − T s )).
To efficiently construct the explicit solution, we convert the problem in (1) with straightforward matrix manipulations to a condensed quadratic programming problem of the form where the W = [w ⊺ 0 , . . . , w ⊺ N −1 ] ⊺ denotes the optimization variable. The problem (2) can be then solved parametrically [5] to obtain explicit MPC w ⋆ 0 = κ(θ), however only for a moderate number of length of the prediction horizon N and low cardinality of the parametric space θ. With the MPCRG strategy (2), we generate a set of open-loop trajectories for given user-defined references and multiple sets of initial conditions, which then serve as a training data set for the construction of explicit controller in the form of a neural network.

III. NEURAL NETWORK CONTROLLERS
One of the main advantages of neural networks is to emulate, with minimal errors, the behaviour of any mathematical function, given sufficiently many neurons in sufficiently many layers. Since MPC strategies can be expressed as closed explicit functions (as shown in [10]), the neural network can be easily used in this setup as well. Let us denote the neural network controller as where the w is the mimicked value of the optimal input, which approximates the optimal counterpart w ⋆ 0 . To obtain the coefficients of the neural network, we solve a regression problem given by where the M denotes number of training data from the set T . The objective consists of a standard mean-square-error term penalised by the α value and with an additional barrierlike term ReLU(0, ΓW − γ), which penalises the violation of inequality constraints (2b). From numerous numerical trialand-error experiments, we concluded that the best penalisation function for such a case is a rectified linear unit which is also a common expression used as an activation function in the structure of neural networks, where the parameter d is substituted with d = βz + b. Conversely, the tuple (β, b) represents the weights of an individual neuron, while the z denotes a general input to the particular neuron.
In order to successfully obtain the parameters of the neural network, i.e., all tuples (β, b) for each neuron, we fix the structure of the neural network C NN . We choose a specific number of hidden layers and a number of nodes in each layer. Then, we solve the regression problem (4). Note that for convenience, we specify the arguments for the optimisation only as (β, b), but we optimise for as many tuples as we have a number of total nodes in the entire neural network.
The entire algorithm for obtaining an explicit version of the MPCRG strategy in the form of a neural network (NNRG) is covered by the following steps: 1) Construct set Θ of initial conditions θ as in (2c).
2) Solve the optimization problem (1) for each value of θ from the set Θ. 3) Form the set T from bullet points 1 and 2. 4) Fix the structure of C NN , i.e. the number of layers and number of activation functions in each layer. 5) Solve the regression problem (4). 6) Verify the quality of the C NN on a closed-loop simulation. If the quality of the C NN in a closed-loop simulation is not satisfactory, one can improve the training in two principal ways. First, a more comprehensive data set Θ can be obtained, which would result in increased quality of imitation learning. Second, a different structure of the neural network can be considered. Finally, once the quality of the closed-loop simulation with the C NN is satisfactory, then we replace the MPCRG strategy with the NNRG controller given by (3).

IV. GOVERNED CLOSED-LOOP PROCESS
To truly demonstrate the benefits of advanced optimisationbased process control, we choose a well-known benchmark model of 4 interconnected quadruple tanks. The arrangement of the tanks is visualised in Fig. 2, while the mathematical model was introduced in [11]. This model covers the dynamics in the changes in the levels of liquid in each tank denoted by h 1,...,4 . The measured variables are h 1,2 , while the manipulated variables v 1,2 are voltages, which are related to the power of the pumps. To design the controller, we considered a statespace model (A c , B c ) in the continuous time given by where the T i denotes the time constant of an individual component of the quadruple tank system, variables A 1,...,4 represent the cross-sectional area of each tank, and parameter γ 1,2 suggest the opening of the three-way valve located after the pumps. The time constant is calculated as follows in which the h s i denotes the linearisation point, which was numerically chosen as h s = [12.4 12.7 1.8 1.7] ⊺ cm, which corresponds to the steady-state value of manipulated variables The continuous model was discretised using the zero-order hold method with the sampling time T s = 5 s. The local controllers were designed according to well-known rules for PI control design for linear systems given in [12]. The primary controllers are of the form with K p,1 = 0.385, T 1 = 62 s, K p,2 = 0.357, T 2 = 90 s. The control objective is to follow the reference changes without an offset while adhering to the following constraints As we will show in the next section, this task is only possible to fulfill with a significant de-tuning of the PI controller. This clearly warrants the improvement with the MPC-based reference governor strategy, which reportedly enhances the performances of any closed-loop system [4], [13]. Tank 1  Tank 2   Tank 3  Tank 4 Pump 1 Pump 2 Fig. 2. Scheme of the quadruple tank system (reproduced from [14]).

A. Synthesis of MPC-based Reference Governor
The optimal control problem representing the MPCRG strategy is constructed with respect to the discrete-time statespace system given by (6) and with the input and state constraints from (11). Next, we restrict the absolute value of the modulated setpoint for the local PI controllers with limitations on the w variable With such a constraint, we mitigate potential huge control errors coming to the PI controller, which may result in the oscillatory behavior of the inner closed-loop system. Furthermore, the performance of the MPCRG strategy is determined by diagonal weighting matrices The prediction horizon was set to N = 20 samples. The MPCRG with mentioned parameters serves as a baseline strategy against which we evaluate the quality of the trained neural network controller. The MPCRG strategy is formulated as is with the YALMIP toolbox [15], and is numerically solved with the GUROBI solver. Note that the problem consists of 12 parameters, including used-defined references. The generation of a truly optimal explicit solution to the MPCRG strategy via the parametric programming approach is unrealistic at best. For such a controller, even a drastic reduction of the prediction horizon results in an enormous number of regions and a huge memory footprint.
A notion of how the complexity rises for even short prediction horizons is evaluated in the Table I. Results presented here were calculated with the MPT3 toolbox [16], on a machine with 128GB RAM and 16-core 3.5 GHz CPU. For horizons greater than 4, we were unable to calculate the explicit solution due to the memory limitation of the computer. This table also reports the number of floating points needed to store the control law as well as the partition of the explicit controller. We will later show that approximation via the neural network, which is also in the explicit form, requires considerably less memory needed for storing the control law.

B. Training of the C NN
First, we start with generating a data set consisting of the initial condition for the MPCRG strategy Θ, as defined in (2c). For each initial condition from the data set Θ, a corresponding optimal setpoint, w ⋆ 0 , is collected. These data tuples present a training data set T , which is used to train the C NN according to the regression problem defined in (4). The structure of the neural network is fixed with 12 inputs parameters, 4 hidden layers with ReLU activation functions, while each hidden layer features 20 neurons. The output layer consists of 2 neurons since we provide 2 modulated setpoints for the local PI controllers. For our case, a set of 30 000 data tuples was prepared as the training set T . For the training part, these data were split with the ratio 7 : 3 for training and testing data, respectively. The data were chosen randomly from the feasible parameter θ. Naturally, one can pay further extra attention to the data preparation. However, with 12 parameters in play, equidistant gridding [17] is not feasible to consider. The C NN was trained in Python using Tensorflow [18]. The mean squared error loss function (4) was minimised by employing the gradient optimiser Adam [19] with a learning rate of 0.001, while we set α = 0.7. Recall that the α weighting factor was introduced in order to penalise the violation of inequality constraints that represent the technological limitations of the controlled process.
The results formulation of the control law in the form of the neural network with ReLU activation functions features 1562 floats, which is in stark difference to the required memory footprint of any explicit controller obtained via parametric optimisation. Also note that the number of floats for the neural network does not change with the increasing length of the prediction horizon.

C. Performance evaluation
A simulation case study consists of three individual experiments, detailed in Fig. 3. First, we show the performance of local controllers without any additional supervision from the advanced layer is. The second set of results denotes the performance of the baseline MPC-based reference governor strategy with the settings from the Section V-A. Let us point out that the constraints considered here are for the liquid levels inside the 4-tank system, while we measure only levels h 1,2 (cf. (6)). Note that the system, under the authority of the local PI controllers, fails to maintain levels at times t = 3000 s and t > 4000 s. However, with the MPCRG strategy, we strictly enforce constraint satisfaction.
Finally, the performance of the C NN was compared with both previously mentioned strategies, but recall that the input for the NNRG strategy was the MPC-based governor performance. We conclude from the case study that the NNRG strategy obeys all technological constraints and follows the user-defined reference in all cases, where possible. We can observe slight offsets between the NNRG and the MPCRG strategy on two occasions. This is caused primarily by the fact that in those two situations, the local PI controllers violate the indented limitations on the level of the liquid.
To express more telling qualitative criteria, we evaluated a  general unweighted quadratic criterion, given as The values of the J LQ criterion are reported in Table II, together with the count on the violation of the constraints. In terms of reference tracking, the best performance naturally provides the MPC-based reference governor strategy since here; the setpoints are provided with rigorous optimisation. On the other hand, the NNRG strategy introduces the sub-optimality to the closed-loop performance, but with the main benefit, which is its explicit form compared to the MPCRG.   We provide the results without penalising constraints violations by setting the α = 1 in the (4) during the training. The neural network is trained in the same fashion as the neural network with α = 0.7 with the initial weights. As can be seen in 4 the non-penalised training does result in minor constraint violations with comparable performance.
VI. CONCLUSION In this paper, we have shown that we can use the neural networks as the reference governor for the processes that are already equipped with local controllers, and we require an explicit form of the reference governor control law. This approach was chosen as the explicit approximation method for the MPC-based reference governor. The main advantage is the significant reduction of the memory requirements compared to the explicit MPC and increased control performance compared to the original controllers.
Simulation results with the quadruple tank system prove that this method is viable for multivariate processes with multiple controllers. Neural networks benefit from the robust approximation of complex functions, and therefore we could imitate an MPC-based reference governor with a long prediction horizon and still get an explicit form o the control law. The training of neural networks with the state violation penalty increased the performance of the local PI controllers. As we showed, the NNRG strategy upholds the constraints even in situations for which the original controller is not suitable to use.