Inverse optimal control for a hybrid dynamical system with impacts

In this paper, we develop an approach to inverse optimal control for a class of hybrid dynamical system with impacts. As it is usually posed, the problem of inverse optimal control is to find a cost function that is consistent with an observed sequence of decisions, under the assumption that these decisions are optimal. We assume instead that observed decisions are only approximately optimal and find a cost function that minimizes the extent to which these decisions violate first-order necessary conditions for optimality. For the hybrid dynamical system that we consider with a cost function that is a linear combination of known basis functions, this minimization is a convex program. In fact, it reduces to a simple least-squares computation that - unlike most other forms of inverse optimal control - can be solved very efficiently. We apply our approach to a dynamic bipedal climbing robot in simulation, showing that we can recover cost functions from observed trajectories that are consistent with two different modes of locomotion.


I. INTRODUCTION
Inverse optimal control (IOC) is the problem of recovering a cost function which explains observations of optimal or "expert" trajectories [1]. In some applications, the goal might be to imitate the behavior of an expert. In other cases, such as the study of human motor control, finding the underlying cost function could be precisely what we are interested in. IOC problems are of interest in a wide range of applications, from basic science [2], [3] and locomotion [4], [5], to optimal control of aircraft [6], and more recently aerobatic helicopter flight [7] within the robotics community.
The problem of IOC has been studied in many contexts. In the context of Markov decision processes, IOC has been studied in [8], [9]. IOC algorithms for non-linear continuous systems have been developed in the context of linearlysolvable Markov decision processes [10]. Also, in a previous paper, we have developed a similar approach for IOC in non-linear continuous systems [11]. However, most of these algorithms require solving optimal control problems which introduces a significant computational bottleneck for these algorithms.
Recently, a new formulation of a problem closely related to IOC has been proposed by [12], which completely eliminates the computational bottleneck of previous IOC algorithms. In this formulation, unlike traditional approaches to IOC, where it is assumed that the choice of the decisions is perfect and N. Aghasadeghi  their observations are noisy, it is assumed that observations of the decisions are perfect, while the choice of the decisions is "approximately optimal". The goal of this approach is to minimize the extent to which first-order necessary conditions of optimality are violated. Minimization and evaluation of these necessary conditions no longer requires solving an optimal control problem, and thus the IOC problem can be solved very efficiently. In fact, the problem reduces to a leastsquares minimization.
The contribution of this paper is extending the new formulation of IOC and the notion of "approximate optimality" to hybrid dynamical systems with impacts. Hybrid dynamical systems consist of both continuous and discrete events [13]. For example, bipedal walking is often modeled as a hybrid system consisting of continuous motions of the lower-limbs and discrete impacts of the legs with the ground. Hybrid systems, can be used in many other fields such as robotics [14], [15] and air traffic control [16]. In this paper, we use the first-order necessary conditions for optimality for a class of hybrid dynamical systems introduced in Long et al [17], and we define the notion of approximate optimality for those systems. This notion will then allow us to introduce the IOC problem as a convex least-squares optimization. The performance of our IOC algorithm is tested on a simulated dynamic bipedal climbing robot.
Optimal control and inverse optimal control for hybrid systems are often difficult, due to the nature of these systems having both discrete and continuous events. Examples of works which have incorporated optimal control for walking systems are [18], [19]. The work by Long et al. [17] developed an optimal control approach where the first derivative of the cost function is taken explicitly with respect to the controls. The application of IOC to one class of hybrid systems was explored in [20], where a cost function was estimated to model human yoyo playing. The problem of IOC for hybrid systems is important because such an approach can have implications in lower limb prosthetic design and understanding human locomotion. Moreover, IOC plays an important complementary role to optimal control. While optimal control can produce desired control for a system, the performance of the controller significantly depends on the cost function chosen to be optimized, and IOC approaches can enable the estimation of appropriate cost functions.
The remainder of this paper is divided in the following manner. In Section II we address the problem of optimal control for a class of hybrid dynamical systems discussed in [17]. We then introduce the inverse optimal control problem for hybrid systems in Section III. We discuss our proposed solution to this problem in Section IV. The proposed solution is evaluated on a simulated dynamic bipedal climbing robot in Section V. We conclude in Section VI, and we present a discussion about possible future research directions.

II. OPTIMAL CONTROL
In this paper, we focus on the class of hybrid dynamical systems with impacts with following forṁ where x ∈ R n is the state of the system at time t, ξ ∈ Ξ are impact controls, f is the continuous dynamics, ∆ is an instantaneous impact map, S is the impact surface and x + is the post-impact state. Once the pair (x, ξ) enters the impact surface S, the state undergoes an instantaneous change to x + governed by the impact map ∆. The impact controls ξ can influence both when the impact occurs and the result of the impact. Note that in this class of systems, the controls in the continuous dynamics are either zero or known. Therefore, this hybrid system can be characterized by a finite set of impact controls ξ = [ξ 1 , ..., ξ N ] T , which produce post-impact states x + = [x + 1 , ..., x + N ] T , and impact times τ = [τ 1 , ..., τ N ] T . In this paper, we use the subscript k = 1, ..., N to denote the k-th impact.
Using the fundamental theorem of calculus along with equation (1), the pre-impact states, represented by x − k can be obtained by forward integration of the continuous dynamics with The post-impact states are then obtained from In this notation, τ − k = τ + k , however, we use superscripts − and + to denote just before or just after the impact. Furthermore, we assume that the impact time τ k+1 is obtained as a function of the last impact time, the impact control input and the previous state as follows To understand how to perform inverse optimal control, we first need to know how optimal trajectories can be obtained by minimizing a cost function. The impact time denoted by τ 0 and the initial state of the system denoted by x(τ 0 ) = x 0 are known, and the optimization is performed with respect to the state variables and the impact control inputs as follows arg min x,τ,ξ J(x, τ, ξ) = arg min where J is a cost function chosen to encode a desired trajectory for the N impacts, and the function L k is the cost enforced at time step k. Note that in general, L k can be any function of the state and impact time of the system. Although we have chosen L k not to be a function of ξ, this can be easily incorporated as well. Examples of these cost functions can be found in section V-B. The optimization in (5) can be posed in a different way. As can be seen from equations (1)-(4), a trajectory can be uniquely represented using the initial conditions x 0 and τ 0 and the impact control inputs ξ k , for k = 1, ..., N , i.e.
x + k = x + k (x 0 , τ 0 , ξ 1 , ..., ξ k ) and τ k = τ k (x 0 , τ 0 , ξ 1 , ..., ξ k ). Therefore, by considering the dependence of state variables on control inputs, optimization (5) can alternatively be written only with respect to the impact controls. In this formulation, the equality constraints are being utilized in the objective, and the new optimization is performed with only constraints on the inputs in the following way In this paper, we assume that the cost function is a weighted combination of some known basis functions. We will denote the basis functions withJ = A technique proposed by [17] for solving this optimization problem is presented later in the paper. Next, we discuss the main focus of this paper, which is the problem of inverse optimal control for the described system.

III. INVERSE OPTIMAL CONTROL
Inverse optimal control is the problem of recovering a cost function that would have resulted in an observed trajectory. Traditionally, the formulation of this problem assumes that the trajectories are optimal, but corrupted by noise. We consider a different formulation of the IOC problem, due to [12], in which we assume that observation of the trajectory is perfect, and that this trajectory is only "approximately optimal". The definition of "approximately optimal" is provided later in the paper.
In order for trajectories to be optimal, they have to satisfy the first-order necessary Karush-Kuhn-Tucker (KKT) conditions of optimality [21]. We assume that we can write the control input constraints ξ ∈ Ξ using a set of Q inequalities g q (ξ) ≤ 0 for q = 1, ..., Q and denoted by the function G = [g 1 , ..., g Q ] T . For the optimization problem (7), the KKT conditions reduce to the existence of λ ∈ R Q + that 4963 satisfy the following conditions which we can express as residuals whereJ and G denote the Jacobian matrices ofJ and G with respect to the impact controls ξ 1 , ..., ξ N , and 0 denotes an appropriately-sized vector of all zeros. The notation [ ; ] refers to vertical concatenation of two matrices. A trajectory is called "approximately optimal" if the condition in (10) is satisfied approximately. More precisely, a trajectory is "approximately optimal" if r 2 is close to zero, and there exists λ ∈ R Q + such that the norm of [α; λ] T [J ; G] and r 3 is close to zero, for an appropriately chosen norm [12]. In this paper, we will be using the euclidean 2-norm, denoted by || · || 2 . Therefore, we pose the IOC problem as the problem of finding a set of weightsα for an observed trajectory, such that the trajectory is approximately optimal. We write the IOC problem in the following way where 1 denotes an appropriately-sized vector of all ones.
Note that the addition of constraints (12) to the least square minimization is to avoid getting the trivial solution of α = 0. The above optimization is convex in the parameters α and λ, making the proposed IOC algorithm easy to solve. Therefore, unlike traditional IOC algorithms which often require solving multiple optimal control problems, the proposed method only relies on a convex optimization. Also, note that although we are solving this problem with one observed trajectory, the problem can be easily extended to finding a cost function which would explain multiple observed trajectories.

IV. PROPOSED SOLUTION
The formulation we have chosen for the IOC problem reduces the solution to a simple and computationally efficient least-squares minimization. The solution to (11) relies on two components. The first is finding the gradient of the basis functions with respect to the control inputs. After the derivatives of the basis functions with respect to the impact controls are computed, a least-squares minimization can be solved to recover the estimated weights of each basis function, thus solving the IOC problem. We will now discuss how to compute the derivatives of the basis functions, based on the results in [17].
We first introduce the operator notation Dg(x) • ∂x to represent derivatives. This notation reads Dg(x) operates on the perturbation of x, ∂x. Moreover, the notation D n g(arg 1 , arg 2 , . . .) • ∂arg n denotes that g(·) is differentiated with respect to the nth argument, and is known as the slot derivative. For more details on this notation, and the analysis in this section refer to [17].
The optimization (11) requires computing the Jacobian matrixJ , which in turn relies on computing D ξiLm (x 0 , ξ) • ∂ξ i for all i and m. Note that we writeL m (x 0 , ξ) since as we discussed earlier, this basis function can be written explicitly as a function of x 0 and ξ. Therefore, while we also use the notationL m (x + k , τ k ) for simplicity in the derivations below, the basis functions should still be thought of as a function of the initial states and the impact controls. Therefore, the derivatives with respect to the impact controls use the chain rule to take the derivative of the entire expression with respect to the impact control.
As presented in [17], we can derive an explicit expression for the derivative of the cost, even when the differential equations governing the dynamics cannot be integrated analytically. The derivative of the basis functionL m (x 0 , ξ) with respect to each set of impact controls, ξ i , at impact i is given by chain rule By investigating the structure of (13), the derivative for each impact control can be concisely expressed. It be shown that only a few things need to be computed for each impact in order to compute all the derivatives.
We need a way to compute these right hand side terms. To do so, note that these terms involve the derivative of the impact map and the impact time with respect to the impact control . The derivative of the impact map and the impact time with respect to the each impact control is derived in [17]. The results of the derivation are , and Φ(t, τ ) is the state transition matrix [22] calculated from The derivative of the m-th basis function is then To compute the derivative of the basis functions with respect to each impact control, we only need to compute

V. EVALUATION
In order to evaluate the performance of the proposed IOC approach for hybrid systems, we simulate the ParkourBot, which is a dynamic bipedal climbing robot. We simulate two different locomotion modes for the ParkourBot, namely bounce in place and climbing. Subsequently, we test whether our proposed algorithm could produce cost functions that are consistent with the locomotion mode, and whether the recovered trajectory resembles the observed trajectory.

A. The ParkourBot
The ParkourBot is a biped robot, designed at Carnegie Mellon. The ParkourBot is equipped with two BowLegs and is similar to the one-legged BowLeg hopper in [23]. During flight, the ParkourBot compresses its spring-like legs, storing elastic energy. At impact, this stored energy is then quickly converted to kinetic energy. By controlling the injected energy and the leg angles, the robot is capable of climbing up and down as well as bouncing in place. A cartoon of the robot climbing is seen in Fig 1. (See videos at http://lims.mech.northwestern.edu/RESEARCH/ current projects/Parkourbot/Parkourbot homepage.html) A simple model for the ParkourBot assumes a point mass for the body with two massless legs. The configuration of the robot is denoted by q(t) = [q 1 (t), q 2 (t)] T representing the horizontal and vertical coordinates respectively. The state is defined by x(t) = [q(t) T ,q(t) T ] T . The inertial reference frame is in the center of the chute, and the walls are a distance d away from the center. The continuous dynamics of the robot can be written as follows: In our simulation, both legs have a length l = 0.3, and g = 1.
The impact controls in this simulation are the leg angles, denoted by ξ = θ. The leg angles are measured positive clockwise from the positive horizontal axis on the right side, as shown in Fig. 2. with Impacts Andrew W. Long, Todd D. Murphey, and Kevin M. Lynch -Hybrid dynamical systems with impacts typically ols that can influence the time of the impact as well lt of the impact. The leg angle of a hopping robot ple of an impact control because it can influence mpact occurs and the direction of the impulse. This ides a method for computing an explicit expression st derivative of a cost function encoding a desired The first derivative can be used with standard n algorithms to find the optimal impact controls for nning of hybrid dynamical systems with impacts. ng derivation is implemented for a simplified model ic climbing robot.

I. INTRODUCTION
, hopping and juggling robots are examples of amical systems with impacts (HDSI). These sysrience continuous dynamics until they undergo , resulting in a switch in the dynamics and/or nuity in the state. Typically these systems have at can influence the time of the impact as well as of the impact. An example impact control could angle prior to impact for a hopping robot: the leg rmines when the impact occurs and influences the e impulse due to the impact. This paper addresses nning with impact controls for a particular class r a hybrid dynamical system with impacts, dethe equations: x + = ∆(x, u, ξ) when (x, u, ξ) ∈ S (2) ∈ X is the state of the system, f represents uous dynamics, S is the impact surface, and ∆ act map that instantaneously maps a pre-impact to a post-impact state x + . The controls consist of uous-time control u ∈ U and the impact control impact control ξ may affect both the impact map ntrol impulses applied at the impact time) as well at which an impact occurs. Impacts occur on the face S, which may be a function of x, u, and/or mple of an HDSI is a biped robot. The dydescribe the phases where the robot has zero, o feet on the ground, and the impact map ∆  corresponds to events when a foot hits the ground. In this case, the impact surface S depends only on x, i.e., S = S(x). Another example is the Monkeybot, a twolink planar robot that dynamically locomotes on a vertical steel wall by using (1) a single motor at the joint between the two links, and (2) passive pivot joints ("hands") at the endpoints of the links that can be electromagnetically connected to or disconnected from the wall. (See, for example, a video of an early prototype of the Monkeybot at http://www.youtube.com/watch?v=0hfwJEVQyeQ.) The dynamics f describe phases when zero, one, or two pivot joints are attached to the wall; the control u is the torque at the joint motor; and impacts occur when the electromagnet at a free-swinging hand is clamped to the wall, creating a pivot joint. In this case, ξ is the set of impact times. Consequently, the impact surface S is independent of x and u.
In this paper, our interest is in systems of the form (1)-(2) where the continuous control is either zero (u = 0) or given by a specified feedback law (u = u(x)). The impact surface S may be a function of both x and ξ, i.e., Σ : where f is assumed to be at least once differentiable with respect to each input. A sequence of impact controls is given by ξ = [ξ 1 , . . . , ξ N ] T for N impacts with impact times τ = [τ 1 , . . . , τ N ] T and post-impact states x + = [x + 1 , . . . , x + N ] T . Since the continuous control u(x) has been specified, the HDSI has been converted to a discrete-time problem.
Examples of such systems include: • Robot parkour. The ParkourBot (see Fig. 1(a)), developed at Carnegie Mellon, is a mobile robot with two springy BowLegs that allow it to dynamically locomote E?@C"?@F##FG!DH""$I!"##$%&&& E!!" The results of the derivation are with A(t) = D 1 f k−1 (x(t), t) and Φ(τ, τ ) = I, where I is the identity matrix. The derivative of the cost simplifies with (11c) to since the derivative of the cost's summand is zero for k < i.
To compute the derivative of the cost with respect to each impact control, we only need to compute C k , Υ k k , Γ k ∀ k = 1, . . . , N. Each derivative is simply a combination (sum and multiplication) of some or all of these quantities as shown in (12).
The explicit expression of the derivative in (12) can be used with stable numerical methods to optimize the impact control sequence. In addition, the explicit expression for the derivative provides a numerical test of optimality, which is useful for terminating the optimization algorithm.

VI. EXAMPLE: PARKOURBOT
The system studied in this example is the two-legged ParkourBot designed at Carnegie Mellon and depicted in Fig. 1(a). The ParkourBot is equipped with two BowLegs and is similar to the one-legged BowLeg hopper in [14]. During flight, the ParkourBot is capable of independently positioning the leg angles and storing energy by compressing the spring-like legs. During stance, the stored energy is converted to kinetic energy. The net amount of energy input to the system is the difference between the amount of energy stored and losses due to the impact. By controlling the net energy input and the leg angles, the robot is capable of climbing up and down as well as bouncing in place. A cartoon of the robot climbing is seen in 1(b). (See videos at http://www.dynaclimb.com/.) A cartoon of the ParkourBot is shown in Fig. 2. This input as well as the leg angles. To simplify the example, the net energy input will be specified. The motion planning optimization is to determine the locally optimal leg angle sequence to drive the robot from an initial state to a final state in a vertical chute. Three cases are investigated with this example: bouncing in place with specifying only the final state, bouncing in place with trajectory tracking and climbing with trajectory tracking. The purpose of the first two cases is to show the benefit of the trajectory tracking. The third case illustrates that the optimization can be used for climbing with constant non-zero energy input and a constant increase in the vertical impact location at each impact. Gradient descent with an Armijo Line search algorithm was used to determine the locally optimal controls [15].
The ParkourBot's configuration is the horizontal and vertical coordinates of the point mass denoted as q(t) = [q 1 (t), q 2 (t)] T ∈ Q. The inertial reference frame is located in the center of the chute with the vertical walls located a distance d away. The state is given as The free flight dynamics are governed by: where m is the mass and g is gravity. For this example, m = 1 and g = 1. The legs are assumed to be the same length, l = 0.3. The angles of the legs, θ, are measured positive clockwise from the horizontal axis positive to the right as shown in Fig. 2. This convention was used to have positive leg angles. For this example, the impact controls are the leg angles (ξ = θ). The distance from the center to either wall is given as d = 1+l cos(π/4). The initial condition was selected to be x(τ 0 ) = [−1, −0.5, 1, 1] T , which produces period-1 bouncing in place motion for optimal angles of π/4 and 3π/4 for the right and left walls, respectively. This initial condition was chosen to verify the results of the optimization. The impact time of (8) can be calculated from where the sign of d depends on which wall is impacted. Since the legs are identical, no labeling of the length and angle is required. The impact time can be calculated from where the sign of d depends on the wall that was impacted. In this setup, the impact impulse only influences the velocity in the direction along the leg. We denote this radial velocity byṙ(t), and we refer to the velocity orthogonal to the leg as the normal velocityṅ(t). These velocities can be calculated using a change of coordinates as follows: Moreover, we can calculate the post-impact velocities at the k-th impact using the following equation: where E k denotes the net energy input to the system. Subsequently, the impact map is given by the following relationship: We do not enforce any constraints on the control inputs of the ParkourBot, and thus the IOC equation (11) simplifies.

B. Simulated Basis Functions
In our simulations we choose a set of 11 basis functions. Two different weightings of these basis functions leads to two different cost functions, and two different behaviors. The impact controls of the ParkourBot are optimized for 6 impacts using each cost function. The trajectory obtained following this optimization is then used with the IOC algorithm discussed in this paper, to produce an estimated cost function. The estimated cost function is then used to produce a trajectory, and the resulting trajectory would be compared with the observed optimal trajectory. The basis functions arẽ

C. Bounce In Place Simulation
In this simulation, we obtain an optimal trajectory and control inputs by solving the optimal control problem with initial conditions of τ 0 = 0 and x(τ 0 ) = [−1, −0.51, 1, 1] T , and initial controls of ξ 0 = {π/5, 4π/5, π/5, 4π/5, π/5, 4π/5}, using the cost function defined by α * = [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0] T , The resulting trajectory can be seen in Fig. 3 We apply the proposed IOC algorithm using a Matlab package for least-squares optimization. Moreover, since the cost function is invariant to a multiplication by a constant, we normalize the recovered weights to match the norm of α * , resulting in Note that the recovered weights are consistent with the optimal weights. We solved the optimal control problem with
Note that again the recovered weights are consistent with the optimal weights. We solved the optimal control problem with the recovered weights, and achieved a perfect recovery of the observed trajectory (Fig. 4)

E. Discussion
In both of these simulations, the observed trajectory was perfectly recovered. The recovered weights for the basis functions are close to the optimal weights, however, the weights are not perfectly recovered. One reason is that the least squares minimization is not unique, since the number of unknowns exceeds the number of equations. This problem may be solved by using additional trajectories from different initial conditions, to ensure a unique solution to the optimization. This topic is very relevant to the problem of IOC, and requires further investigation. In this paper, we have introduced an IOC algorithm for a class of hybrid dynamical systems. The method of approach is computationally efficient, unlike previous approaches to IOC. We applied our approach to a simulated dynamic bipedal climbing robot, with different simulated locomotion modes. Our algorithm was able to successfully estimate an appropriate cost function based on the mode, and also reproduce a trajectory close to the observed trajectory.
Future extensions of this approach to more complex hybrid systems, can prove to be useful in modeling and understanding human locomotion. This approach can potentially help the development of real-time locomotion mode selection algorithms for prostheses [24]. Moreover, application of this method to observed human locomotion trajectories can help shed light on how humans control locomotion.