Learning Robustly Stabilizing Explicit Model Predictive Controllers: A Non-Regular Sampling Approach

—Off-line supervised learning from data of robustly-stabilizing nonlinear explicit model predictive controllers (EMPC) is dealt with in this letter. The learning procedure relies on the construction of a suitably large set of speciﬁcally chosen sampling points of the state space in which the values of the optimal EMPC control function have to be computed. When bounding the magnitude of approximation errors is important for stability or performance speciﬁcations, regular gridding techniques are not feasible due to the curse of dimensionality arising from the structural exponential growth of the number of points with the state dimension. In this note, we consider non-regular sampling techniques – namely, i.i.d. sampling with uniform distribution, low-discrepancy sequences and lattice point sets – that offer a good covering of the state space without suffering from an unfeasible growth of the number of points, while preserving at the same time the method guarantees in terms of robustness and stability. Some theoretical properties of the proposed sampling schemes are brieﬂy discussed, and their successful application is show-cased in a practically-relevant optimal heating problem involving a 21-dimensional state space that rules out the use of regular gridding techniques.


I. INTRODUCTION
T HE MODEL predictive control (MPC) scheme is the most successful multivariable control approach in industrial applications owing to its properties in terms of optimization of a complex cost functional for nonlinear and possibly large-scale systems, while simultaneously satisfying state and control constraints. The reader is referred to the vast literature on the subject (see, for instance the very recent book [1] and the well-known papers [2], [3], [4] as well as the references cited therein).
To cope with the high computational burden of online optimization to compute the MPC control action at each timestage, an approach known in the literature as explicit MPC (EMPC) has been devised, which shifts most of the required computation in a off-line phase (see, for instance the survey paper [5] as well as the references cited therein). More specifically, the basic idea of EMPC is the off-line construction of an explicit function μ that generates in a fast way the same control vector u t we would obtain applying the online MPC procedure, as a function of the current state x t , i.e., u t = μ(x t ). Unfortunately, as is well known, in the general nonlinear and/or constrained case, it is impossible to determine an analytical form for such MPC control function. Thus, the implementation of the EMPC approach must rely on an approximation μ * of the true MPC control function μ.
The procedure to obtain such approximate function can be stated in a typical context of supervised learning from data. According to this framework, we look for the approximate function within a class of parameterized models. The best element inside such class is chosen minimizing the difference with respect to the true MPC control computed over a finite set of elements of the state space (see the seminal paper [6] and the very recent book [7]). However, unlike the classic learning setting, in the EMPC framework considered here the above elements of the state space (the "patterns", in machine learning parlance) are not given by an external source, but yet they need to be carefully selected. Indeed, the cardinality of the set of these specific elements of the state space is also a key aspect for the EMPC procedure to be applicable in realistic settings.
In [8] it was proved that the Input-to-State Stability (ISS) of the controlled system can be guaranteed (under suitable regularity conditions) when the sampling points correspond to standard regular grids with even-spaced values for each state component. Unfortunately, this kind of sampling of the state space is affected by the curse of dimensionality, since the number of grid points is structurally bound to grow exponentially with the dimension of the state space. Hence, the practical feasibility of approximate robustly-stabilizing EMPC is limited to very low-dimensional control scenarios.
In this note, we analyze the use of non-regular sampling methodologies to learn approximate robustly-stabilizing EMPC schemes in high-dimensional settings that are not feasible for sampling algorithms based on regular gridding of the state space. We deal with a slightly different framework than the one presented in [9] where a specific kind of nonregular sampling technique has been used in a disturbance-free setting (see also [10] and the references therein for other learning and optimal control scenarios using non-regular sampling approaches).
More specifically, we investigate three different sampling algorithms, namely, the classic independent and identically distributed (i.i.d.) random sampling with uniform distribution, low-discrepancy sequences and the method of good lattice points [11], [12]. A key common feature of these nonregular sampling schemes is that they all guarantee that the stability properties of the approximate EMPC controller are preserved, while not being subject to the structural curse of dimensionality of regular grids. In particular, we show the convergence guarantees taking advantage of quantities typical of number-theoretic methods and numerical integration, namely, the dispersion and the discrepancy.
In order to show the effectiveness in practice of the proposed non-regular sampling learning procedure, all the three proposed sequences are implemented for the design of an EMPC law to impose a desired temperature profile to a copper plate. The spatial discretization of partial differential equations describing the heat transfer model leads to a 21-dimensional state space that rules out the use of regular gridding techniques. Our numerical results show that all the proposed non-regular grids yield EMPC controllers (with neural networks as learning models) able to approximate well the MPC solution, while the classic regular grid performs very poorly and cannot be improved by a more accurate sampling due to the exponential growth in the computational complexity.

II. FORMULATION OF THE LEARNING PROBLEM FOR EXPLICIT MODEL PREDICTIVE CONTROL
Consider the discrete-time nonlinear dynamical system where x t ∈ X ⊂ R n is the state vector, u t ∈ U ⊂ R m is the control vector and ϑ t ∈ ⊂ R r is an exogenous disturbance input. We assume that the sets X and U are compact Euclidean subsets of R n and R m , respectively, containing the origin as an interior point. The stabilizing control design is based on the nominal model denotes the discrete-time state transition uncertainty. We assume thatf (0, 0) = 0. Definition 1: Consider a positive integer N, a given initial state x 0 and a generic sequence of control vectors {u 0 , . . . , u N−1 }. The N-stage cost is defined as where h, h N are the stage and final stage costs, respectively, and, for every k = 0, . . . , N − 1, x k+1 =f (x k , u k ).
In the usual MPC framework [1], the control action at time stage t when the system is in the generic state x t is obtained by minimizing J FH with x 0 = x t . Then, only the first optimal control vector u * t is actually applied to drive the system to the next state x t+1 . The EMPC problem consists in constructing the control function given in the following definition.
Definition 2: For every x t ∈ X, consider the sequence In this note, we address the off-line construction of an empirical approximation μ * ∈ of the EMPC function given in Definition 2 by a learning from data approach, where is a suitably "rich" class of parametrized approximating models (for instance, feedforward neural networks, kernel-based approximating models, deep neural structures, etc.). The availability of the EMPC function μ * allows to obtain on-line the approximate control action u * t = μ * (x t ) for any observed state x t , without the need to perform at each time instant the minimization of J FH [6], [7].
To this end, the first step is to construct a suitable data set and the corresponding set of MPC control actions Considering a loss function as introduced in [7], the following supervised learning from data problem is stated.
Problem 1: Find On the Construction of X L : The off-line construction of the set U L involves a significant computational load since it requires the solution of an optimization problem for every state sample x (l) ∈ X L . Indeed, the choice of the cardinality L of X L and the selection of the specific values taken on by x (l) are crucial in order to ensure that the mismatch between the EMPC function μ • and its learned approximation μ * is not too large in order to provide the required robust stability properties (see Section III). In this respect, when the dimension n of the state space takes on large values, the construction of the set X L through a uniform gridding procedure may turn out to be unfeasible due to the well-know curse of dimensionality. In fact, the number of points of a regular grid with b discretization levels for each component of the state is equal to b n , which yields an unfeasible number L of samples when n is large, even for small values of b. This motivates the use of non-regular sampling approaches to construct X L (see Section IV). 1 The approximation properties of the function space in terms of the dimension n of the state space, the complexity p of the approximating models and the smoothness characteristics of the target function μ • is a crucial issue that is out of the scope of this letter. The reader is referred to [7] for an up-to-date comprehensive treatment of this topic.

III. THE SET X L AND ROBUST STABILITY PROPERTIES
As mentioned at the end of Section II, the construction of the sample set X L is key not only from a computational complexity perspective, but also to ensure that the learned EMPC control policy is equipped with suitable robust stability properties.
The following general result on the ISS robust stability properties of the closed-loop control system driven by the approximate EMPC control law μ * (x (l) , w) can be easily obtained from the specific results given in [7], [8] for a set X L constructed by a regular gridding procedure.
Proposition 1: Assume that the following conditions are satisfied.
1) The nominal functionf (x, u) is Lipschitz continuous with respect to x, with Lipschitz constant denoted as L x , and is uniformly continuous in u.
2) The exogenous disturbance ϑ t is bounded, and the uncer- where g is a continuous, strictly increasing function zero-valued in 0.

3) The approximate MPC control μ * (x, w) is locally
Lipschitz continuous with respect to x for any w, with Lipschitz constant denoted as L μ (x, w). Then, a sufficient condition for systemf driven by μ * to be ISS on X is: whereq is such that for any y ∈ X there exists x ∈ X L such that |y − x| <q, and ε * (x) is a suitable constant depending on L x and the bound on d t . The first term in the left hand-side of (2) is where the choice of the discretization set X L affects the stability. In particular, the parameterq represents the maximum distance that is allowed between any point in X and its closest point in the set X L , meaning that smaller values ofq correspond to higher denseness of the sampling grid.
The condition on having a smallq can be expressed in terms of the dispersion of the set of points X L , a quantity commonly employed in number-theoretic and numerical integration methods, defined (see [11]) as In fact, for any valueq that is suitable to satisfy (2), it is sufficient for the set X L to have δ * (X L ) ≤q. As a straightforward consequence, the ISS stability of the system is guaranteed if we generate X L through an algorithm that ensures the convergence to 0 of the dispersion as L grows. From now on, we assume, without loss of generality, that X = [0, 1] n , i.e., the closed unitary n-dimensional cube (the generic hyperrectangle X can be obtained from [0, 1] n by simple scaling).
It is easy to see that the regular uniform grid satisfies the convergence property. In fact, if we consider a discretization of each state component by b values, the dispersion is equal to half the diagonal of the elementary cube forming the grid, i.e., δ * (X L ) = √ n 2 L −1/n . Yet, as said, the number of points of such a regular grid is bound to be equal to b n , which makes it unfeasible when the state dimension n is large.

IV. NON-REGULAR SAMPLING GRIDS
We recall the definition of discrepancy, a quantity that is closely related to the dispersion. The discrepancy of a set X L in X is defined [11] as: where J is a family of Lebesgue-measurable subsets of X, λ denotes the Lebsesgue measure and A(X L , J) denotes the number of points of X L contained in J. Here in particular we consider the case in which the family J is the set of all the subintervals J ∈ X having form n i=1 [a i , b i ] ⊂ R n , where a i < b i . Dispersion and discrepancy are closely related. In fact, it can be easily proved [11] that δ * (X L ) ≤ D(X L ) 1/n , which implies that a generating algorithm ensuring convergence of the discrepancy also ensures also the convergence of the dispersion (notice that the converse is not true).
Remark 1: It is worth noting that, while there are several methods in the literature explicitly designed to provide sample sets characterized by favourable discrepancy rates, there are no principled solutions available for the dispersion. This is the main reason why we consider the former quantity for the rest of the analysis.
In the following, three different examples of uniform spacecovering sequences are given that can be used as feasible alternatives to the regular uniform grid.

A. Uniform Random i.i.d. Sequences
The simplest solution to generate uniform points in the n-dimensional unit hypercube is a sequence of L points drawn randomly with uniform distribution. From the point of view of the discrepancy, it can be shown [13] that an i.i.d. sequence of L points drawn with uniform distribution is characterized by a convergence rate for the discrepancy of order O((log log L) 1/2 / √ n) almost surely. Thus, a random i.i.d. uniform sequence satisfies the condition of δ * (X L ) → 0 concerning the convergence of the dispersion, with probability one. However this convergence guarantee is only probabilistic, which can be regarded as a drawback in our context of robust EMPC.
Another drawback is that the resulting sampling is often not well uniformly scattered in the state space, leading to the formation of clusters of points that may leave other zones of the state space undersampled. This means that the learning model has no information regarding the true EMPC function in those regions, and the resulting approximate control coming from the learning procedure becomes less accurate when the state visits those portions of the input space. Figure 1(a) illustrates an example of sampling of the 2-dimensional unit hypercube with L = 500 i.i.d. points, in which the clustering effect, together with the related under sampling of portions of the input space, is well noticeable.

B. Low-Discrepancy Sequences
The so-called "low-discrepancy sequences" are a family of algorithms for the generation of points born in the field of quasi-random numerical integration. Examples of such sequences are the Sobol', the Faure, the Niederreiter and the Halton sequence [11], [14]. The aim of these algorithms is to provide point sets that cover the unit hypercube uniformly, in the sense that any subset contains a share of points that is proportional to its volume (following the definition of discrepancy), in a deterministic way. Figure 1(b) illustrates an example, namely, 500 points from the 2-dimensional Sobol' sequence. The construction of LDSs can be generalized through the concept of "digital (T, n)-sequences" (see, e.g., [14] for a comprehensive introduction and treatment).
More in detail, the construction of a (T, n)-sequence in base b, for b prime, is based on a set of n (N × N) matrices C 1 , . . . , C n having elements in Z b = {0, . . . , b − 1}. Then, to generate the j-th point x j of the sequence, we write j in The p-th component of x j is obtained first by multiplying the p-th matrix C p by v, yielding the vector (y j,p,1 , y j,p,2 , . . . ) T = C p ·v where each element is again in Z b , and finally taking x j,p = y j,p,1 b −1 + y j,p,2 b −2 + · · · . Notice that, in practice, the infinite vector v can be truncated after a few elements.
The above-described procedure is not computationally intensive, and many of the aforementioned special instances, like the Sobol' sequence, can be found already implemented for the most common numerical software environments.
For the purpose of the EMPC method considered in this letter, the most noteworthy property of (T, n)-sequences is that they guarantee the convergence of the discrepancy to zero as L grows [11], with a rate of order O(L −1 (log L) n ). Furthermore, since the construction procedure is fixed, this convergence rate is deterministic. This eventually means that LDS sequences satisfy the condition stated in (2), thus ensuring the robustness of the procedure.

C. Rank-1 Lattice Sample Sets
This kind of sequence can be generically ascribed to the family of digital sequences seen in the previous section, even if they were proposed independently for numerical integration [12]. Geometrically, a lattice consists of the infinite repetition of elementary unit cells, defined by a linear combination of n n-dimensional vectors. Within the unitary hypercube (or a closed hypercube in general) the points of the lattice can be expressed conveniently based on a single n-dimensional integer vector. More specifically, consider an integer vector z ∈ Z n having no factor in common with L. Then, a lattice point set X L of L points can be obtained as where · denotes the component-wise fractional part. Such a formulation usually takes the name of Rank-1 Lattice Point Set (R1LPS) [12]. A standard choice for z, following Korobov's rule [15], is to take z = [1, w, w 2 mod L, . . . , w n−1 mod L] for an integer w ≤ L/2 .
In Figure 1(c) a R1LPS lattice is shown, with z = [1,131]. Although their construction is very easy and straightforward to implement in any numerical software environment, the variability due to the choice of z implies that not all R1LPSs necessarily offer an efficient covering of the state space. However, "good" lattice points satisfy the condition in (2). In fact, it can be proved [11] that the average rate of convergence of the discrepancy over z of a R1LPS is O(L −1 (log L) n ), and that there exists at least a z * that yields this rate exactly.
To help in the choice of good vectors, various quantities have been devised in the literature to evaluate a priori the uniformity of a lattice depending on z. For instance, the P α index (see, e.g., [11]) is related to the discrepancy in such a way that sets with low values of P α are expected to be good also from the discrepancy (and, as a consequence, dispersion) point of view. Furthermore, such index can be explicitly computed (see, e.g., [12,Ch. 10.5]), thus it can be used for a search over z to isolate "good" generating vectors for the R1LPS in the dimension n.
Remark 2: In general, some hints can be obtained on the most convenient non-regular scheme to be applied for the problem at hand, borrowing from the theoretical results available for the use of such methods in numerical integration. In particular, LDS points are expected to perform particularly well when the MPC function belongs to a weighted Sobolev space [14] (roughly speaking, when the involved functions depend strongly only on just a few of the n components), while the LPS sets are particularly effective when the involved function exhibits strong spatial regularity (in the extreme case, when it is periodic in space) [12].

V. SIMULATION RESULTS
In this section, we present an application of the proposed sampling schemes to a 21-dimensional case study, involving a rectangular plate made of some thermally conductive material (e.g., copper) and subject to 4 heat sources that we control in order to impose a desired temperature profile. This kind of system is very common in industry (inductive heating on steel plants) and in the oil/gas sector.
Specifically, the purpose of the control is to raise or lower the temperature of the plate initially assumed to be equal to the ambient temperature, using heat sources located at each side of the rectangle. Here we assume that each pair of opposite sides of the plate is controlled by the same source, eventually yielding two freely assignable control values. Let ϕ(t, x, y) be the temperature of the plate at time t > 0 and point (x, y), with 0 ≤ x ≤ L x and 0 ≤ y ≤ L y . The classic heat equation governing the system is given by a partial differential equation of the form ∂ t ϕ = α(∂ xx ϕ + ∂ yy ϕ). In order to apply the MPC scheme, we need to discretize the heat equation with respect to both time and space: we employ a forward time centered space approach, with a sample time t = 0.05 s and spatial stepsizes equal to x = y = 0.1 m. Therefore, the heat equation can be written in the form where ϕ(t, i, j) = ϕ(t · t, i · x, j · y), α = 1.11 · 10 −4 m 2 /s (the conductivity of copper), i = 1, . . . , n and j = 1, . . . , m.
In the simulations, we consider n = 8 and m = 5 whereby L x = 0.7 m and L y = 0.4 m. Boundary conditions must be added to make equation (6) defined for every temporal stage. More specifically, we assume ϕ(t + 1, 3, 1) = ϕ(t + 1, 3, 5) = u 1 (t) and ϕ(t + 1, 1, 3) = ϕ(t + 1, 8, 3) = u 2 (t), while all the other entries in the boundaries are kept fixed at the ambient temperature. The controls u 1 and u 2 are, therefore, able to set the temperature on two discretization points and they are constrained to assume values in the range |u 1 (t) − ϕ(t, 3, 1)| ≤ 0.1 • C and |u 2 (t) − ϕ(t, 1, 3)| ≤ 0.05 • C. This constraint reflects the requirement of not changing the plate temperature too abruptly. Furthermore, we want to limit the magnitude of the controls.
The aim of the control is to bring the temperature of the plate from an initial ambient temperature as close as possible to a target temperature using the four heat sources. Without loss of generality, we can always assume that the target temperature is zero, since the dynamics described by the heat equation are defined up to an additive constant. Therefore, we adopt the following functional cost over 5 temporal stages: such that  It is worth noting that the dimension of the state vector is 21: we have ϕ(t, i, j) for i = 2, . . . , n − 1 and j = 2, . . . , m − 1 corresponding to the 18 variables ruled by equation (6); on the boundary, we have that ϕ(t+1, 3, 1) = ϕ(t+1, 3,5) and ϕ(t+ 1, 1, 3) = ϕ(t + 1, 8, 3), which amounts to other 2 variables, while all the other states on the boundary collapse into the ambient temperature (represented by the last state variable).
As the baseline reference for the training set X L we have considered a regular grid of samples in the 21-dimensional hypercube with lower bounds −10 and upper bounds 10. Specifically, we have considered the regular grid consisting of two discretization levels (-3.5 and 3.5) for each dimension. Since we have 21 dimensions, the total number of points L of this grid is 2 21 = 2097152. Notice that this is the only regular grid we can employ here in practice, since using three discretization levels would already yield more than 10 billion points. To test the non-regular discretization schemes proposed in Section IV, we have considered sets having the same number L = 2 21 points as the regular grid. In particular, in the case of LDS sampling the Sobol' sequence was employed [16]. For the LPS case we have followed Korobov's rule to obtain the generator vector z, choosing as w the value that minimized the index P α (for α = 2) mentioned in Section IV over 100 trial values. In case of i.i.d. uniform random sequences, due to their intrinsically probabilistic nature, 10 different sequences were generated.
In order to implement the EMPC procedure described in Section II, the class of two-hidden layers sigmoidal neural networks with 10 hidden units in every layer has been chosen to learn the MPC control. For the implementation, we used the Python scikit-learn package. The employed performance criterion for the optimization of the model parameters is the mean squared error with L 2 regularization of the weights, in order to keep the Lipschitz constant of the network bounded.
To quantify the performance of the networks we evaluate the generalization ability of the prediction model to correctly approximate the true MPC control law in the points of a test set, defined as a collection of samples not belonging to the training set. To this purpose, a test set made of 10000 points randomly generated in the range [−10, 10] has been employed.
The mean absolute error (MAE) between the MPC control obtained minimizing J and the output of the various neural networks is displayed in Table I. Notice that in the case of the IID sequences the MAE reported is the average over the 10 realizations of the points of the various grids, while the standard deviation is reported in the parentheses. Figure 2 compares: (i) an example of the temporal evolution of the temperature over the plate (starting from the ambient temperature equal to −5 • C) obtained with the true MPC control, (ii) the approximation obtained with a non-regular sampling (specifically, the LDS one) and (iii) the approximation using the regular uniform grid.

VI. COMMENTS AND CONCLUDING REMARKS
The results from Table I and Figure 2 show how the approximation of the MPC control is very good with the three different non-regular sampling schemes, while the uniform regular grid is not able to provide a successful approximation. In fact, Table I shows how, despite the same number L of points, the error from the regular grid turns out to be 2 orders of magnitude larger than the error from the nonregular ones. Accordingly, the temporal evolution in Figure 2 is almost identical in case of the true MPC control and the LDS approximation, reaching the best possible achievable temperature profile, while the approximation from the uniform grid never converges to a satisfactory profile. Overall, the simulation results prove how an alternative to the regular grid, by means of an algorithm able to spread the points well over the input space, is mandatory in high-dimensional contexts.
Among the non-regular schemes, Table I shows how the i.i.d. sampling performs slightly better than the rest in average, but the reported standard deviation shows that some of the i.i.d. grids in the tests perform worse than those from the other two sampling schemes. Overall, all the proposed sampling schemes yield actually very close results in Table I, showing that they all are viable options.