An example of solving HJB equations using sparse grid for feedback control

It is well known that solving the Hamilton-Jacobi-Bellman (HJB) equation in moderate and high dimensions (d > 3) suffers the curse of dimensionality. In this paper, we introduce and demonstrate an example of solving the 6-D HJB equation for the optimal attitude control of a rigid body equipped with two pairs of momentum wheels. The system is uncontrollable. To mitigate the curse-of-dimensionality, a computational method based on sparse grids is introduced. The method is causality free, which enjoys the advantage of perfect parallelism. The problem is solved using several hundred CPU cores in parallel. In the simulations, the solution of the HJB equation is integrated into a model predictive control for optimal attitude stabilization.


I. INTRODUCTION
It is well known that a feedback control law of nonlinear systems can be constructed based on the solution of the Hamilton-Jacobi-Bellman (HJB) equation, which is a partial differential equation (PDE). This theoretically elegant approach suffers some difficulties in computation due to the curse of dimensionality, a term that was coined by Richard E. Bellman when considering problems in dynamic optimization, which relates to the fact that the size of the discretized problem in solving HJB equations increases exponentially with the dimension. Finding an approximate solution to the HJB-type of equations in a local neighborhood of a trajectory has been extensively studied [1], [6], [14], [20], [21]. Some of them can be applied to systems in high dimensions. However, finding semi-global solutions to HJB equations, i.e., solutions satisfying a required accuracy in a given domain, faces the curse of dimensionality. In fact, numerically solving the HJB equation for a nonlinear system with more than four state variables is already challenging, if not impossible. For instance, the HJB equations for rigid body systems controlled by momentum wheels have six state variables. To the best of our knowledge, no semi-global solutions have been found for this problem.
In [15], a computational method of solving HJB equations for nonlinear systems is developed based on sparse grids. The algorithm is causality free, i.e the value of the solution at a gridpoint in the state space is computed independently from the value at other gridpoints. A causality free method has a perfect parallelism, which is a significant advantage for high dimensional problems. This method has the potential This work was supported in part by AFOSR, NRL, and NPS CRUSER The authors are with Faculty of Applied Mathematics, Naval Postgraduate School, Monterey, CA 93943, USA. wkang@nps.edu,

lwilcox@nps.edu
The views expressed in this document are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government to mitigate the curse of dimensionality for system with a moderate dimension. The idea of sparse grids is based on Smolyak's work [22]. The reader is also referred to [4], [10], [25] for more details in different perspectives. Relative to a dense grid, the size of the sparse grid is significantly reduce. In the computation, the solution at each gridpoint is found using the Lobatto IIIa method that solves a two-point boundary problem (TPBVP). Different from some popular algorithms of solving PDEs, this approach is not based on spatial causality. A significant advantage of this causality free method lies in its perfectly parallelism, a desirable property for modern computation equipment with manycore clusters. In the example, all compuations are carried out in a parallel computer and 512 CPU cores are used.
In this paper, we exemplify the computational method by solving a problem of optimal attitude control of rigid body. The example in itself is interesting. Consider a rigid body controlled by two pairs of momentum wheels. Although satellite systems are quipped with at least three pairs of control momentum wheels, malfunction may occur to some wheels. It has been proved in [7] that this system is uncontrollable. How to stabilize the satellite around a desired attitude is a challenging problem. Related work can be found in [24], [11], [17], [19], [23], [12] and references therein. Different from exiting results, we do no assume zero angular momentum. In addition, the controller based on the solution of the associated HJB equation is smooth. The optimal control is able to smoothly stabilize the rigid body at an attitude that is closest to the desired orientation. The result in this paper is different from those in [15] where the stabilization does not guarantee an optimal attitude and the control is open-loop. In the present paper, we optimize a cost function that automatically stabilizes the system at an optimal attitude. In addition, the result is integrated with a model predictive control (MPC) to achieve feedback stabilization in the presence of noise.

II. A CAUSALITY FREE ALGORITHM
Consider a control systeṁ The cost functional to be minimized is Define the Hamiltonian x, )) The HJB equation with boundary condition is defined as follows It is a PDE in which is a column vector. The feedback control law is a function defined as follows x)) Partial differential equations like (3) are numerically solved using discretization methods that result in a finite dimensional problem. The dimension of the discretized space increases exponentially with the dimension of the PDE. If d is the dimension of a PDE, then the size of a dense grid is N d , where N is the number of gridpoints used to approximate a single variable in the PDE. For example, If = 32 gridpoints are used to approximate a single variable, which is quite small, the total number of gridpoints for a 6-D problem is over 10 9 . If 100 points are used for a single variable, then N d > 10 12 . The required computational time and memory size by such dense grids are simply too high for practical applications.
In [15], a causality free computational method consists of two components: (1) A solver that can find the value of V (t, x) at any gridpoint; and the computation is independent of V (t, x) at other points. (2) A set of gridpoints, such as a sparse grid, with a reduced size to make the problem tractable. It is a known fact that the size of sparse grids increases with the dimension, d, in the order of ), which is in sharp contrast to the size of the corresponding dense grid O(N d ). Obviously, the significantly reduced number of gridpoints has its impact on the accuracy. For a given function f : if f has bounded mixed derivatives up to the second order. Compared with the error bound using a dense grid, we pay a small price in terms of accuracy to achieve a significantly reduced size of the grid. For more details, the reader is referred to [4], [10] and references therein.
In this paper, we adopt the Chebyshev-Gauss-Lobatto (CGL) sparse grid [2]. Sparse grids have a hierarchical structure. For each variable, the set of gridpoints contains several layers of subsets, denoted by X i . The number of points in each subset satisfies The gridpoints are defined as follows 8 > > < > > : [22], [2], the sparse grid, denoted by G q sparse , is defined as follows,  Figure 1 shows two examples of G q sparse . For q = 8, G q sparse has 385 gridpoints whereas the corresponding dense grid has (2 6 + 1) 2 = 4225 points. The difference of grid size is increasingly significant for higher dimensions. The significantly reduced number of gridpoints makes it possible to discretize a PDE into a tractable numerical problem.
Some discretization methods, such as finite difference and central difference, are based on spatial causality. The value of V (t, x) at a gridpoint is approximated using the value of V (t, x) at some other gridpoints in its neighborhood. It is difficult to apply this type of methods on a sparse grid because the distance between adjacent points in a sparse grid varies in a large range due to the hierarchical structure. In contrast to this, the discretization technique in [15] does not discretize a HJB equations directly. Instead, it uses Pontryagin's minimum principle (PMP) to derive a set of necessary conditions in the form of a boundary value problem for each gridpoint. As a result, the computation of the solution at a point in space is independent of other points. This approach is also different from the semi-Lagrangian method on sparse grid in [3] in which the equation is integrated backward in time while the gridpoints are adaptively adjusted based on the value of the computed solution at neighboring points at a later time.
In the following, the value V (t 0 , x 0 ) for any point x 0 in the sparse grid is computed by solving the follosing two point boundary value problem (TPBVP) derived from PMP, with the boundary conditions The optimal control and the minimum costs are Numerical algorithms for boundary value problems similar to (6) have a sizable literature. In this paper, we adopt an algorithm based on the four-stage Lobatto IIIa formula. This is a collocation formula and the collocation polynomial provides a solution that is fifth-order accurate [16]. A nice feature of this algorithm is that the error of estimation can be numerically approximated. In the computation, the numerical solver is able to achieve approximate solutions with an error tolerance ✏ = 10 12 . For the reason of computational time, we set error tolerance at ✏ = 10 6 .
From a conventional viewpoint of computation, solving a TPBVP is not an efficient approach for PDEs. However, this method has perfect parallelism. Although not a preferred algorithm for serial computers, causality free algorithms can easily be implemented in massively parallel computational equipment. In this paper, all computations are carried out in parallel using 512 CPU cores. The combination of sparse grids, boundary value problem solvers, and parallel computation is the key to mitigate the curse of dimensionality effectively for problems in a moderate or high dimensional space. In this paper, the examples have d = 6.
Solving the HJB equation can be done off-line using massive computation. The result is then uploaded onto the system for real-time feedback control. The on-line computation must be simple and fast. For this purpose, an interpolations on a sparse grid is used to compute V (t, x) if x is not a gridpoint.
j (x) A few basis functions at CGL gridpoints are shown in Figure  2. They are defined using a polynomial interpolation The interpolation on a sparse grid does not need every basis The weights, w i j , are called hierarchical surpluses.

III. THE OPTIMAL CONTROL OF RIGID BODY -AN UNCONTROLLABLE CASE
Let {e 1 , e 2 , e 3 } be an inertial frame of orthonormal vectors and let {e 0 1 , e 0 2 , e 0 3 } be a body-fixed frame, or body frame. In this paper, the attitude of a satellite is represented by Euler angles (see [8] ) in which , ✓, and are the angles of rotation around e 0 1 , e 0 2 , and e 0 3 , respectively, in the order of (3, 2, 1). The angular velocity is a vector in the body frame, The control system using momentum wheels is defined by a set of differential equations [7] v where B 2 < 3⇥2 is a constant matrix, u is the control torque, J 2 < 3⇥3 is a combination of inertia matrices of the rigid body without wheels and the momentum wheels, H 2 < 3 is the total and constant angular momentum of the system, and E(v), S(!), R(v) are matrices. Details can be found in [15].
The system has only two control inputs. In [7], it is proved that (9) The optimal control is to minimize the following funcational The function v e (v, !) represents the optimal attitude reachable from (v, !). It is a known fact that this system is uncontrollable. The desired attitude, in our example v = 0, may not be reachable. In the case of H = 0, nonsmooth controllers are derived to stabilize the system [24], [11], [17], [19], [23], [12]. In the case of H 6 = 0, a manifold of reachable states (v, !) satisfies [7] The attitude v e (v, !) is a target attitude in this reachable manifold. A satellite system may have to meet multiple requirements of orientation, such as pointing sensors to the desired direction and at the same time keeping its solar panel facing the Sun. But the desired attitude, for instance v = 0, may not lie on the manifold of reachable states. We define the following optimal v e (v, !) as the target state for stabilization. For this purpose, we minimize the Frobenius distance between R(v) and I = R(0). Because both matrices are orthogonal, it is equivalent to v e (v, !) = arg max v tr(R(ṽ)) subject to It can be proved that v e (v(t), !(t)) is a constant along any controlled trajectory. Therefore, it can be treated as a constant in the derivation of (6). The solution of (13) can be found by means of numerical nonlinear programming.
The HJB equation (3) is solved at t = 0 on a sparse grid in the following doamin, D, D : The sparse grid of q = 13 is used. The number of gridpoints for each dimension is 2 q 6 + 1 = 129. The total number of gridpoints in the 6-D domain is |G q sparse | = 44, 689 which is small in comparison with the size of a dense grid, In D, the TPBVP (6) is solved at each gridpoint in G q sparse using a method based on a four-stage Lobatto IIIa formula [16]. The computation is carried out in Hamming, a parallel computer in Naval Postgraduate School. Although as many as |G q sparse | can be used, we limit the computation to 512 CPU cores.
For (v, !) not in G q sparse , the value of V (0, v, !) is approximated through interpolation. The hierarchical surpluses for interpolation are computed using either (8) or other more efficient algorithms. An advantage of a causality free method based on a Lobatto IIIa algorithm is that the error of V (0, v, !) can be approximated at an arbitrary point in D, without being contaminated by the computational error at the gridpoints. In fact, under some smoothness assumptions, the error in the solution of (6) can be numerically estimated [16]. To validate the accuracy of the numerical results, N = 1280 points are randomly generated in D. The value of V (0, v, !) is computed at these points using interpolation on G q sparse . The true value at the same point is approximated by solving (6). We use the difference between these two numbers as an approximate of the error. Because the error of the TPBVP solver is significantly smaller than the interpolation error, the later is ignored. The mean absolute error (MAE) is 8.5⇥10 3 and the root-mean-square error (RMSE) is 1.7 ⇥ 10 2 . The results are summarized in Table I  high as PDE solvers in some 2-D or 3-D cases. This is not surprising because we sacrifice the accuracy in exchange for a sparse grid that is tractable in 6-D. A typical trajectory is shown in Figure 3.

V. FEEDBACK CONTROL
In an optimal control design based on HJB equations, the most computationally intensive part, i.e. solving the HJB equation, is done off-line. Once the equation is solved, the computation of real-time feedback control at (v, !) is a straightforward interpolation with a minimum computational load. The solution of the HJB equation in the previous section is integrated with a MPC to stabilize the system at a desired optimal attitude. In the simulations we adopt a basic zero-order hold MPC controller. The sampling rate is 10 Hz. In each time interval, u optimal is computed using interpolation on the sparse grid. It is assumed that the measurement of v and ! is corrupted by a random noise of uniform distribution with a magnitude about 0.5% of the maximum state value. Several trajectories under the closedloop control are shown in Figures 4 -6.

A. Conclusions
The causality free method used in this paper has perfect parallelism. It can be easily integrated with any grid, specifically sparse grid in this paper to mitigate the curse of dimensionality. The example shows that the algorithm is tractable for a 6-D HJB equation. In contrast to dense grids, the size of sparse grids is significantly smaller. In the case of an uncontrollable rigid body with six state variables, a solution is achieved on a sparse grid with about 4.5 ⇥ 10 5 points, whereas the corresponding dense grid has more than 10 12 points. The solution of the HJB equation is integrated into a MPC controller. Simulations show that the uncontrollable system is stabilized at an optimal attitude in the manifold of reachable states.

B. Future Works
In exchange for a tractable algorithm of solving high dimensional HJB equations, the accuracy of the solution is reduced by a factor of (log N ) d 1 . In the future work, error analysis will be carried out to analyze and estimate the error in causality free methods. Also included in the future research is to improve the accuracy and to test the method using various different problems.