An algebraic approach to dynamic optimisation of nonlinear systems: a survey and some new results

Dynamic optimisation, with a particular focus on optimal control and nonzero-sum diﬀerential games, is considered. For nonlinear systems solutions sought via the dynamic programming strategy are inevitably characterised by partial diﬀerential equations (PDEs) which are often diﬃcult to solve. A detailed overview of a con- trol design framework which enables the systematic construction of approximate solutions for optimal control problems and diﬀerential games without requiring the explicit solution of any PDE is provided along with a novel design of a nonlinear control gain aimed at improving the ‘level of approximation’ achieved. Multi-agent systems are considered as a possible application of the theory.


Introduction
In this paper we comprehensively address the wide area of dynamic optimisation, encompassing single-objective problems (optimal control) as well as, potentially conflicting, multi-objective ones (differential games). The latter is instrumental also for providing a solution to the L 2 -disturbance attenuation problem, which represents a nonlinear counterpart of the well-known linear H ∞ control problem (see, for instance, Doyle, Glover, Khargonekar, and Francis (1989); van der Schaft (1992van der Schaft ( , 2000; Zhou, Doyle, and Glover (1996)). Techniques available for studying these problems typically fall within two categories: those based on Pontryagin's minimum principle and those based on the dynamic programming (DP) method (see, e.g., Bertsekas (2005); Clarke and Vinter (1987)). Despite the fact that -differently from the former in general -the latter approach yields necessary and sufficient conditions for optimality, techniques based on the DP approach are seldom pursued in practical applications. This is due to a common feature shared by such techniques, namely the requirement of the explicit solution to partial differential equations (PDEs). In particular, solutions of the problems dealt with herein are related to the so-called Hamilton-Jacobi-Bellman (HJB) equation or the Hamilton-Jacobi-Isaacs (HJI) equation (see, e.g. Basar and Olsder (1999); Starr and Ho (1969b); Vinter (2000)). In practice, obtaining closed-form solutions to such PDEs can be a daunting -or even impossible -task. Motivated by this fact, in the following sections we discuss a method for systematically constructing (approximate) solutions to optimal control problems and differential games without involving the explicit solution of any PDE, thus overcoming this computational hurdle. The technique hinges upon the notion of algebraicP solutions and the immersion of the underlying nonlinear dynamics into an extended state-space.
AlgebraicP solutions are defined as matrix-valued functions which satisfy certain conditions, vaguely reminiscent of equations encountered in approaches for solving optimal control problems based on state-dependent Riccati equations (SDRE), also known as frozen Riccati equations (FREs), as seen, for instance, in Ç imen (2008,2010); Elloumi, Sansa, and Braiek (2012); Huang and Lu (1996). In Huang and Lu (1996) the authors also note that in the context of L 2 -disturbance attenuation the HJB inequality can be formulated (via Schur's complement) as a nonlinear matrix inequality, which can then be solved numerically. In Sakamoto and van der Schaft (2008) the authors propose two methods for approximating the stabilizing solution of the HJB equation using the framework of differential geometry and the stable manifold theory. Further insights on HJB equations based on generalized differential Riccati equations and differential geometry are available in Kawano and Ohtsuka (2017); van der Schaft (2015). In the context of differential games, instead, the majority of methods available to study and solve such problems rely on numerical methods for solving PDEs (see, for instance, Botkin, Hoffmann, and Turova (2011) and references therein).
The control design methodology surveyed in this paper has been introduced for optimal control and L 2 -disturbance attenuation in Sassano and Astolfi (2012) and for nonzero-sum differential games in Mylvaganam, Sassano, and Astolfi (2015). The constructive approach yields a systematic method for obtaining approximate solutions for optimal control problems and, differently from the SDRE approach (see, e.g. Ç imen (2008)), the level of approximation is exactly quantifiable and can, in principle, be minimised. Thus, the proposed methodology has certain benefits with respect to both SDRE-based approaches and with respect to the linear quadratic (LQ) approximations of the nonlinear problems. The latter observation is addressed in Mylvaganam et al. (2015); Sassano and Astolfi (2012) where the superior performance of the proposed method with respect to LQ approximations is demonstrated by means of numerical examples. In this paper we provide a comprehensive overview of the approach along with a novel design of a nonlinear control gain which enables us to achieve a 'tighter' approximation of solutions for optimal control problems or differential games with respect to the results available in Mylvaganam et al. (2015); Sassano and Astolfi (2012). In addition to infinite-horizon optimal control and nonzero-sum differential games, the machinery presented herein has proved useful for a range of control problems such as observer and adaptive control design for nonlinear systems (Karagiannis, Sassano, and Astolfi (2009)), finite-horizon optimal control (Sassano and Astolfi (2013a), constrained optimal control (Scarciotti and Astolfi (2014), L 2 -disturbance attenuation and optimal control via output feedback ), optimal control for stochastic systems Mylvaganam (2018a, 2018b)) and passivity-based control for port-controlled Hamiltonian systems (Nunna, Sassano, and Astolfi (2015)). To provide a detailed overview of the main ideas behind the control design we focus on (infinite-horizon) optimal control and (infinite-horizon) differential games.
The remainder of this paper is organised as follows. The notion of algebraicP solution and its application to Lyapunov stability analysis -crucial to the constructive control design methodology -are introduced in Section 2. The two control problems considered in the paper, namely optimal control and differential games are then considered independently in Section 3 and Section 4, respectively. In the latter we also provide some insights into the L 2 -disturbance attenuation problem, which can be formulated as a two-player, zero-sum, differential game. In Section 5 we present the application of the developed control design framework to the so-called multi-agent collision avoidance problem before some concluding remarks are given in Section 6.
Notation: Standard notation is adopted in this paper. The set of real and complex numbers are denoted by R and C, respectively. Z >0 denotes the set of positive natural numbers. Given a function V : R n → R and a vector x ∈ R n , V x denotes the partial derivative of V with respect to x, provided it exists. Given a mapping V : R n → R m and a vector x ∈ R n , ∇ x V denotes the Jacobian matrix of V with respect to x, provided it exists. A mapping f : R m → R n is said to be smooth if it is of class C ∞ , i.e. if it has derivatives of all orders. Given a matrix M ∈ R n×n , σ(M ) denotes its spectrum and M denotes its transpose. I and 0 denote the identity and zero matrices, respectively. Given a vector x ∈ R n , x R = x Rx, where R = R > 0.

The Notion of AlgebraicP solution
In this section we introduce the notion of algebraicP solution along with the concept of dynamic Lyapunov function. These two notions will be extensively utilised throughout the remainder of the paper to construct solutions to the more complex control problems addressed in this paper. For clarity of presentation, single partial differential equations and systems of coupled partial differential equations are considered separately in the following subsections.
2.1. AlgebraicP solutions for a single partial differential equation Consider a dynamical system described by the equatioṅ where x(t) ∈ R n denotes the state of the system, u(t) ∈ R m denotes the control input andf : R n × R m → R n denotes a smooth mapping, such that x = 0 is an equilibrium of the unforced system, namelyf (0, 0) = 0. Moreover, let q : R n → R denote a given smooth function. In a neighbourhood of the origin one can associate to the system (1) and the quadratic approximation of the mapping q, namelyq(x) = x Q x. As is common, the solution to several nonlinear control problems -such as stabilisation, regulation and optimal control, to mention just a few, is provided in terms of the solution of certain PDEs. To deal with these problems in a unified way, consider a generic firstorder PDE in the unknown function V : R n → R, involving the nonlinear dynamics (1) and the function q, given by where D : R n × R n → R denotes a continuous operator, the structure of which is instantiated by the specific problem among those mentioned above. We associate to the PDE (3) the notion of algebraicP solution defined in the following statement.
Definition 2.1. The matrix-valued function P : R n → R n×n is said to be a X -algebraicP solution of (3) if it satisfies the following conditions.
(i) P (x) = P (x) satisfies the equation where σ(x) = x Σ(x)x, for some matrix-valued function Σ : R n → R n×n such that Σ(x) = Σ(x) and Σ(0) =Σ > 0, for all x ∈ X ⊆ R n . (ii) P (x) is tangent at the origin to the solutionP of the PDE corresponding to the linear approximation of the system dynamics and the quadratic approximation of the mappings 1 q and σ, i.e.
If X = R n , then P is said to be an algebraicP solution for the partial differential equation (3). • Note that clearly a solution to (3) constitutes a solution also to (10), characterised by σ(x) = 0 for all x ∈ X , while the converse does not hold in general since the Jacobian matrix of the mapping x P (x) is not required to be symmetric, hence the latter condition is much milder than the former. Moreover, despite the fact that the definition in (10) may be reminiscent of the approaches based on the so-called statedependent linearization technique 2 , here the algebraic solution is combined with a dynamic extension that yields a systematic control design strategy with guaranteed asymptotic stability and performance. For clarity, the main ideas are first illustrated on the case of stability analysis in Section 2.3 and subsequently extended to dynamic optimisation problems.

AlgebraicP solutions for coupled partial differential equations
The notion of algebraicP solution can also be defined for systems of coupled PDEs which arise, for instance, in the context of differential games, considered in Section 4. Consider a dynamical system described by the equatioṅ where N ∈ Z >0 , x(t) ∈ R n denotes the state of the system, u i (t) ∈ R mi , i = 1, . . . , N , denote N independent control inputs andf : R n × R m → R n , m = N i=1 m i , denotes a smooth mapping, such that x = 0 is an equilibrium of the unforced system, namelỹ f (0, 0, ..., 0) = 0. Moreover, let q i : R n → R, i = 1, . . . , N , denote given smooth functions. Similarly to the previous section, in a neighbourhood of the origin one can associate to the system (6) the linear dynamicṡ where for i = 1, . . . , N . and the quadratic approximations of the mappings q i , namelyq i (x) = x Q i x, i = 1, . . . , N . Consider a system of coupled PDEs in the unknown functions V i : R n → R, i = 1, . . . , N , given by where D i : R n × R n → R, i = 1, . . . , N , denote continuous operators. We extend the notion of algebraicP solution for a single PDE to the system of coupled PDEs (9) as detailed in the following statement.
Definition 2.2. The matrix-valued functions P i : R n → R n×n , i = 1, . . . , N , are said to constitute a X -algebraicP solution of (9) if they satisfy the following conditions.
(i) P i (x) = P i (x) , i = 1, . . . , N , satisfies the equation for i = 1, . . . , N , where σ i (x) = x Σ i (x)x for some matrix-valued mappings Σ i : R n → R n×n such that Σ i (x) = Σ i (x) and Σ i (0) =Σ i > 0, for all x ∈ X ⊆ R n . (ii) P i (x) is tangent at the origin to the solutionP i of the PDEs corresponding to the linear approximation of the system dynamics and the quadratic approximation of the mappings 3 q i and σ i , i = 1, . . . , N , i.e.
for i = 1, . . . , N , such thatP i =P i and N i=1P i > 0. If X = R n , then the P i , i = 1, . . . , N are said to be an algebraicP solution for the system of coupled partial differential equations (9). • In the remainder of this paper we assume for simplicity that X = R n . However, all the statements can be modified accordingly if X ⊂ R n .

AlgebraicP solutions and Lyapunov stability
As a preliminary -albeit illustrative -application of the notion of algebraicP solution, we consider the problem of stability analysis via Lyapunov theory, thus providing the intuition behind the key ingredients employed in the study of dynamic optimisation in the subsequent sections. To this end, consider a nonlinear autonomous system described by equations of the formẋ where x(t) ∈ R n denotes the state of the system and f : U ⊆ R n → R n is a smooth mapping, such that f (0) = 0 and with U containing the origin of R n . By relying on classical results (see, e.g. Khalil (1996)), the stability analysis of the zero-equilibrium can be carried out as formulated in (3) by defining the operator , for any positive definite function q. More precisely, a positive definite function V : R n → R >0 is a Lyapunov function for system (12) provided D(f, q, V x ) = 0, whereas the linearized problem yields the standard Lyapunov equation A P +P A + Q = 0, in the unknownP =P > 0. The existence of such a Lyapunov function V implies asymptotic stability of the equilibrium (Khalil (1996)) while its explicit knowledge may be of interest for additional control tasks, e.g. the estimation of the basin of attraction (Chesi (2007(Chesi ( , 2013; Vannelli and Vidyasagar (1985)) or the design of stabilizing control laws via back-stepping (Sepulchre, Jankovic, and Kokotovic (2012))); nonetheless the direct computation of the solution to (3) may not be straightforward. The following statement suggests a strategy to explicitly construct a Lyapunov function by relying on the notion of algebraicP solution, the computation of which involves the solution of algebraic equations, thus circumventing the need for the solution to any PDE.
Proposition 2.3. Consider system (12) and define the operator D(f, q, V x ) = V x f (x) + q(x), with q : R n → R >0 . Suppose that there exists an algebraicP solution P : R n → R n×n of (3) with D as above. Then the function V : R n × R n → R defined as with ξ ∈ R n , is positive definite in a neighbourhood U of the origin of R n × R n for any R = R > 0 and there exists κ > 0 such that V is a Lyapunov function for the immersed dynamicṡ for any κ ∈ [κ , ∞) and for all (x, ξ) ∈ U. Moreover, x = 0 is a locally asymptotically stable equilibrium point of system (12).
Proof: To begin with, by the tangency condition in item (ii) of Definition 2.1, the quadratic approximation of the function V in (13) around the origin is V q (x, ξ) = (1/2)x (P + R)x − x Rξ + (1/2)ξ Rξ, which can be shown to be positive definite for any R > 0 by a Schur complement argument. The latter in turn implies the existence of a neighbourhood U p of the origin of R n × R n in which the function (13) is positive definite. On the other hand, the time derivative of V along the trajectories of the augmented system (14) iṡ where the continuous matrix-valued functions F : R n → R n×n and Φ : The first inequality is obtained by item (i) of the definition of algebraicP solution, namely x P (x)f (x) + q(x) + σ(x) = 0, and by positive definiteness of q. Finally, the matrix-valued functions describing the quadratic form in the last line of (15) are defined as Since the kernel of the matrix C(x, ξ) is tangent at the origin to the subspace Z = im I 0 and since the matrix M (0, 0) restricted to Z is equal to −Σ < 0, it follows from the main result of Anstreicher and Wright (2000) and by continuity of the involved functions thatV is negative definite in a neighourhood U of the origin for sufficiently large κ . Thus the first claim of the statement holds by considering U defined as any sub-level set of the function V contained in U p ∩ U . Moreover, By Lemma 4.5 in Khalil (1996), (local) asymptotic stability of the zero equilibrium of system (14) is equivalent to the existence of a class KL function β such that (x(t), ξ(t)) β( (x(0), ξ(0)) , t) for all t 0 and for any (x(0), ξ(0)) ∈ U. Therefore x(t) β( (x(0), 0) , t) β ( x(0) , t) proving asymptotic stability of the origin of the system (12).
Remark 1. The previous statement contains in a nutshell all the main ingredients of the constructions carried out in the following sections concerning dynamic optimisation. AlgebraicP solutions are in general easier to compute since -as it appears evident from item (i) of Definition 2.1 -the property of integrability (and potentially of positivity) is relaxed with respect to requirements on the solution of (3), which must be an exact differential. However, this aspect inevitably implies that the algebraic solution cannot be integrated in a straight-forward manner to determine a generating scalar function. This crucial point is tackled here by considering the immersion of the original nonlinear system into an extended state-space in which the function V in (13) is constructed to retain the key feature that V x = x P (x) + δ(x, ξ) -thus enjoying the property of the algebraicP solution of zeroing the operator D -while δ : R n × R n → R n , with δ(x, x) = 0 for any x ∈ R n , is a mismatch term necessarily arising from the fact that the algebraicP solution is not associated to a closed oneform. Finally, the mismatch δ is dynamically compensated by the selection of the time evolution of the dynamic extension ξ(t), in closed loop with the trajectories of the original system, as suggested by the last line of (15). The function (13) can be interpreted as a 'dynamic Lyapunov function' as introduced in Sassano and Astolfi (2013b). In fact, the main result of Theorem 2.3 is similar to the results stated in Theorem 1 and Lemma 2 in Sassano and Astolfi (2013b). However, differently from Sassano and Astolfi (2013b), where the algebraicP solution is defined as a vector-valued mapping, here the algebraicP solution is defined as a matrix-valued mapping.
In the following sections ideas and constructions inspired by the formal statement of Proposition 2.3 and by the intuition in Remark 1 are extended to the problems of optimal control and dynamic games.

Optimal Control
The goal of optimal control is to design a control law that achieves a certain objective in an optimal manner along the trajectories of the resulting closed-loop system. The objective is described by a given functional, hence solving the optimal control problem lies in determining a control law which minimises a cost or maximises a pay-off, see e.g. Vinter (2000). The problem formulation and a solution that does not require the solution of any PDE are presented in this section. In the following we consider only the infinite-horizon problem, similarly to what has been pursued in Sassano and Astolfi (2012). Finite-horizon optimal control with and without input constraints has been considered in Scarciotti and Astolfi (2014) and Sassano and Astolfi (2013a), respectively, whereas optimal control with output feedback has been considered in .

Problem formulation
Consider a nonlinear, input-affine system described by the equatioṅ with state x(t) ∈ R n , control input u(t) ∈ R m and smooth mappings f : R n → R n and g : R n → R n×m . The (infinite-horizon) optimal control problem consists in determining a control input u that renders the zero equilibrium of the closed-loop system (locally) asymptotically stable and that minimises the cost functional where the first term in the integral is a running cost with q : R n → R 0 and the second term is a penalty on the control effort. Note that the cost is parameterised by Assumption 1. The smooth mapping f is such that 5 f (0) = 0, i.e. x = 0 is an 5 The requirement that q is positive definite can be replaced by positive semidefiniteness and zero-state de-equilibrium point for the system (17) when u(t) = 0 for all t 0.
Assumption 2. The running cost, q is such that q(x) > 0 for all x ∈ R n \ {0} and q(0) = 0. Moreover, there exists a matrix-valued function Q : The formal definition of the infinite-horizon optimal control problem is provided below since it slightly deviates from the classical one: as the construction in Proposition 2.3 may intuitively suggest, here we allow for possibly dynamic, rather than static, control laws in such a way that a control problem, e.g. the optimal control, similar and suitably related to the original one though defined in an augmented state-space may be systematically solved.
Problem 1. Consider the system (17) and the cost functional (18), satisfying Assumptions 1 and 2. The infinite-horizon (dynamic) optimal control problem with stability consists in determining an integer ν 0, a dynamic control law described bẏ and an open set U ⊂ R n × R ν containing the origin such that: (i) the zero equilibrium of the interconnected system (17)-(19) is asymptotically stable with region of attraction containing U; (ii) for anyũ(x, ξ) and any (x 0 , ξ 0 ) such that the trajectory of system (17), (19a) interconnected byũ remains in U the inequality 6 J(β) J(ũ) holds.
By relying on the principle of optimality (Bertsekas (2005)) and DP arguments, it has been shown that the static, namely with ν = 0, solution to Problem 1 is obtained, similarly to (3), by introducing the operator and determining a continuously differentiable, positive definite function V : R n → R 0 , V (0) = 0, such that D(f, q, V x ) = 0, leading to the so-called Hamilton-Jacobi-Bellman (HJB) equation. In particular, the optimal control law is then provided by the static state feedback u (x) = −g(x) V x (the interested reader is referred, e.g. to Bertsekas (2005); Bryson and Ho (1975); Vinter (2000)).
In the following statements we employ machinery similar to that discussed in Section 2.3 for stability analysis in order to compute -or approximate in a sense to be specified -the solution to Problem 1 by relying only on algebraic solutions to the partial differential operator D introduced in (20).
tectability. Namely, Assumption 2 can be replaced by the properties that q is positive semidefinite and that the system (17) with output y = q(x) is zero-state detectable, i.e. u(t) = 0 and y(t) = 0 for all t ≥ 0 imply lim t→∞ x(t) = 0.

Design of optimal control laws via algebraic conditions
In this section the notion of algebraicP solution is exploited to obtain an approximate solution for Problem 1 based on DP without requiring the solution of the HJB PDE.
Theorem 3.1. Consider the system (17), the cost functional (18) and suppose Assumptions 1 and 2 are satisfied. Suppose that P : R n → R n×n is an algebraicP solution of (20) and define the associated function V : R n → R as in (13) Then there exist a constant κ > 0 and an open set U such that V solves the partial differential inequality for all κ ∈ [κ , ∞) and all (x, ξ) ∈ U. Therefore, the dynamic control laẇ solves Problem 1 with respect to the modified running costq(x, ξ) = q(x) − 2H κ (x, ξ) q(x), with ν = n and for any initial condition (x 0 , ξ 0 ) ∈ U.
Proof: The first part of the claim follows directly from reasoning similar to that of the proof of Proposition 2.3, by firstly noting that by definition of algebraicP solution of (20), the matrix-valued function P is such that and P (0) =P , withP denoting the maximal solution to the algebraic Riccati equation P A + A P +Q +Σ −P BB P = 0. Following the same steps it can be shown that with C(x, ξ) as given in (16) and as defined in Section 2.3. Thus, the function V solves -by construction -the inequality (21) and the equality The last part of the claim follows directly from the observation that (25) is the HJB equation and thus V is the value function corresponding to the (classical) optimal control problem defined on the extended state-space (x, ξ) and with running cost q(x, ξ) = q(x) − H κ (x, ξ).
Remark 2. While Theorem 3.1 is similar to Theorem 3 in Sassano and Astolfi (2012), it differs in the definition of the algebraicP solution, which is defined as a matrixvalued mapping herein.
The statement of Theorem 3.1 entails that computing an algebraicP solution to (20) is enough to design a dynamic control law that locally approximates the underlying optimal solution, with guaranteed stability and performance. Moreover, as implicitly suggested by the proof of Theorem 3.1, there are two different sources of approximation in the design of the control law (22), namely the shape of the running cost that is minimised by such control law, i.e. the function H κ (x, ξ), and the value of the cost functional, i.e. J = V (x 0 , ξ 0 ) by definition of value function. As far as the latter is concerned, for a given initial conditionx 0 ∈ R n of the original plant (17), the value of the cost can be additionally minimised by suitably initialising the controller, namely by letting ξ 0 ∈ arg min ξ V (x 0 , ξ).
The following theorem, instead, addresses the former source of approximation. To provide a concise statement of the result, define the sub-manifold M ε {(x, ξ) ∈ R n × R n : V ξ (x, ξ) < ε}, for a given constant ε > 0, which constitutes an ε-inflation of the set M 0 {(x, ξ) ∈ R n ×R n : V ξ (x, ξ) = 0}, denoted by M ε = B ε (M 0 ), for any arbitrarily small ε. Moreover, given I = [δ l , δ u ] consider the continuous (asymmetric) saturation function sat I (x) = max(δ l , min(δ u , x)) with x ∈ R, δ u > 0 and δ l ≤ 0. Let and note that k l is non-positive, since the argument is non-positive, e.g., for any M 0 ∩ U.
Theorem 3.2. Consider the system (17), the cost functional (18) and suppose Assumptions 1 and 2 hold. Suppose that P : R n → R n×n is an algebraicP solution of (20) and define the associated function V : R n → R as in (13) with R = R > 0. Let I = [k l /ε 2 , κ ] and consider the continuous function 7 Then the dynamic control laẇ solves Problem 1 with respect to the modified running cost for any ε > 0 and for any initial condition (x 0 , ξ 0 ) ∈ U.
7 Note that Theorem 3.2 is stated under the same conditions as those in Theorem 3.1, hence the conclusions of Theorem 3.1 hold, including the existence and the role of the constant κ .
Remark 3. The statement of Theorem 3.2 entails that under the same hypotheses of Theorem 3.1, by allowing for a nonlinear gain instead of a constant gain κ in (22), the trajectories of the resulting closed-loop system (17)-(27) minimise a cost functional comprising a running cost that is point-wise identical to the original one apart from a certain sub-manifold, which can be rendered arbitrarily small in the extended statespace and in which the running cost cannot deviate more than the constant case due to the use of the saturation function.

Differential Games
Whereas optimal control concerns the design of a single (although not necessarily scalar) control input to minimise a single cost functional, nonzero-sum differential games study systems influenced by several independent players via their individual control inputs to minimise individual, possibly conflicting, cost functionals (see, e.g. Basar and Olsder (1999) for a detailed introduction to differential games). The problem formulation, centred about the notion of Nash equilibrium strategies, and a solution that does not require the solution of any PDE is presented in this section. We consider the infinite-horizon problem similarly to what has been done in Mylvaganam et al. (2015).

Problem formulation
Consider the nonlinear, input-affine system described by the equatioṅ with state x(t) ∈ R n , control inputs u i (t) ∈ R mi , for i = 1, . . . , N , N ∈ Z, and smooth mappings f : R n → R n and g i : R n → R n×mi , i = 1, . . . , N . The system (30) representens a plant influenced by N independent control inputs u i , commonly referred to as strategies. Namely, u i (t), t 0, is said to be the strategy of player i, for i = 1, . . . , N . Each player i, i = 1, . . . , N , seeks to minimise its individual cost functional where the first term in the integral is a running cost with q i : R n → R + , the second term is a penalty on the player's control effort and the third term reflects that the i-th player benefits, in a competitive setting, from the other players wasting their control efforts. Note that the costs are parameterised by x(0) = x 0 .
Remark 4. Different cost functionals can be considered subject to the corresponding modifications of the HJI equations (33), i = 1, . . . , N , introduced below. For instance, the cost functions considered in Section 5 do not include the third 'competitive term, Assumption 3. The smooth mapping f is such that f (0) = 0, i.e. x = 0 is an equilibrium point for the system (17) when u i (t) = 0, i = 1, . . . , N , for all t 0.
As seen in the preceding sections, a consequence of Assumption 3 is that there exists a continuous matrix-valued function F : R → R n×n such that f (x) = F (x)x. Since an optimal solution, i.e. a set of strategies (one for each player), that simultaneously minimises the cost functionals, J i,x0 for every i = 1, ..., N , may not exist, differently from optimal control, several solution concepts have been proposed for differential games, such as Nash or Stackelberg equilibrium solutions (see, for instance, Basar and Olsder (1999); Cruz (1973a, 1973b)). The formal definition of the infinite-horizon differential game in terms of feedback Nash equilibrium solutions 9 is provided below. As in the case of the optimal control problem considered in Section 3, the problem is slightly different from the classical differential game in the sense 8 The requirement regarding positive definiteness can be replaced by positive semidefinitess and zero-state detectability, similarly to what has been done in the consideration of the optimal control problem. Namely, Assumption 4 can be replaced by the assumptions that N i=1 q i is positive semidefinite and that the system (30) with output y = N i=1 q i is zero-state detectable. 9 While we focus on the most common solution concept, namely the Nash equilibrium, the results can be applied to different solution concepts (see, e.g.  for Stackelberg solutions). that we allow for possibly dynamic control strategies and, as a result, the problem is defined on an augmented state-space.
Problem 2 (Differential game). Consider the system (30) and the cost functionals (31), i = 1, . . . , N , satisfying Assumptions 3 and 4. Solving the infinite-horizon (dynamic) differential game with stability consists in determining an integer ν 0, a set of dynamic control laws described by the equationṡ and an open set U ⊂ R n × R ν containing the origin of R n × R ν such that: (i) the zero equilibrium of the interconnected system (30)-(32), for i = 1, . . . , N , is asymptotically stable with region of attraction containing U; (ii) for anyũ i (x, ξ) and any (x 0 , ξ 0 ) such that the trajectory of the system (30), (32a) interconnected byũ i remains in U the Nash equilibrium inequalities 10 The strategy u * i = β i , is said to be the Nash equilibrium strategy of player i, for i = 1, . . . , N , and the set of strategies (u * 1 , . . . , u * N ) is said to be the Nash equilibrium solution of the differential game.
It has been shown (following DP arguments) that the static, i.e. with ν = 0, solution to Problem 2 is obtained by introducing the operators and determining continuously differentiable functions V i : . . , V Nx ) = 0, for i = 1, . . . , N , leading to the so-called Hamilton-Jacobi-Isaacs (HJI) equations. The (feedback) Nash equilibrium solution of Problem 2 is then given by the set of strategies u * i = −g i (x) V i x , i = 1, . . . , N . The interested reader is referred to Basar and Olsder (1999); Ho (1969a, 1969b).
In the following an approach similar to that presented for optimal control problems in Section 3.2 is provided to construct an approximate solution to Problem 2. The approach relies only on algebraicP solutions to the partial differential operators D i introduced in (33), i = 1, . . . , N .

Approximate solutions to differential games via algebraic conditions
In this section the notion of algebraicP solution is exploited to obtain an approximate solution for Problem 2 based on DP without requiring the solution of the HJI PDEs.
Proof: The first part of the proof follows steps similar to the proofs of Proposition 2.3 and Theorem 3.1. Namely, it can be shown that where where It follows that the functions V i satisfy the inequalities (36) in a neighbourhood U containing the origin. Asymptotic stability is proved by standard Lyapunov arguments with W (x, ξ) = N i=1 V i (x, ξ) as Lyapunov function candidate and using (37). The last part of the claim follows from the observation that the V i 's are the value functions corresponding to the (classical) differential game defined on the extended state-space (x, ξ) and with running costsq Remark 6. The dynamic control laws (38), i = 1, . . . , N , constitute an approximate solution for the original differential game with running costs q i (x, ξ), i = 1, . . . , N , in Problem 2. More precisely, it can be demonstrated that the dynamic control laws constitute a so-called α -Nash equilibrium as introduced in Mylvaganam et al. (2015). Namely, there exists a non-negative constant α 0 parameterised with respect to x(0) = x 0 and α > 0, such that the set of dynamic control laws {u 1 , . . . , u N } in (38) are such that J i (x 0 , u 1 , . . . , u i , . . . , u N ) J i (x 0 , u 1 , . . . ,ũ i , . . . u N ) + x0,α , for all u i = u * i such that σ(A cl,ũi + αI) ⊂ C − , where A cl,ũi is the matrix describing the linearisation at the origin of the system (30) in closed-loop with (u 1 , . . . ,ũ i , . . . , u N ), for i = 1, ..., N . Intuitively, the conditions describe a classic -Nash equilibrium in which the Nash strategies are compared only with strategies that are sufficiently aggressive, namely which assign closed-loop eigenvalues faster than −α.
Remark 7. A nonlinear gain, similar to the one proposed in Theorem 3.2, can be used instead of the constant gain κ in the dynamic control laws (38), i = 1, . . . , N , to achieve a 'tighter' approximation of the solution of the original differential game.
Remark 8. Consider the nonlinear, input-affine systeṁ where the first equation describes the plant with state x(t) ∈ R n , control input u(t) ∈ R m and an exogenous input d(t) ∈ R p and the second equation describes a penalty variable z and f : R n → R n , g : R n → R n×m , p : R n → R n×p , l : R n → R q×m and h : R n → R q denote smooth mappings. The L 2 -disturbance attenuation problem (which is the nonlinear counterpart of the H ∞ control problem) consists in determining a (possibly dynamic) state feedback which ensures that the zero equilibrium of the resulting closed-loop system is asymptotically stable with region of attraction U and it is such that for every d ∈ L 2 (0, T) such that the trajectories of the system remain in U, the L 2 -gain of the closed-loop system from d to z is less than or equal to γ, i.e.
It is well-known that the L 2 -disturbance attenuation problem can be formulated as a two-player zero-sum differential game, the solution of which is characterised by the partial differential inequality as seen, for instance, in van der Schaft (1992). The machinery presented for optimal control and differential games, in particular Theorem 3.1 (given that the solution is characterised by a single partial differential inequality), can be used to provide a solution to the L 2 -disturbance attenuation problem. Differently from the case of optimal control and zero-sum differential games, since the dynamic feedback is such that the function (13) satisfies the partial differential inequality (21) in a neighbourhood U by construction, it constitutes an exact solution for the L 2 -disturbance attenuation problem. A more detailed analysis of the L 2 -disturbance attenuation problem has been provided in Sassano and Astolfi (2012).
Remark 9. While the L 2 -disturbance attenuation problem can be formulated as a two-player zero-sum differential game, H 2 /H ∞ control can be formulated as a two-player nonzero-sum differential game (see, e.g. Limebeer, Anderson, and Hendel (1994); Lin (1996)). This game theoretic formulation essentially captures the trade-off between optimality and robustness (see, for instance, Limebeer et al. (1994); Zhou, Doyle, Glover, and Bodenheimer (1990)). For general, nonlinear systems an approach similar to that of Theorem 4.1 can be applied to construct approximate solutions for such problems (Mylvaganam and Astolfi (2016))

Application to multi-agent collision avoidance
The control design methodologies presented in this paper have been applied to a variety of problems, such as robotic systems in Sassano and Astolfi (2012), mechanical systems in Passenbrunner, Sassano, and del Re (2011);Sassano and Astolfi (2011), Lotka-Volterra models arising in biological systems in Mylvaganam et al. (2015) and power systems in Mylvaganam and Astolfi (2015a). Notably, the results presented in Sections 3 and 4 have been applied to control problems related to multi-agent systems (MAS), including coverage control in Astolfi (2012, 2014), collision avoidance in Mylvaganam and Sassano (2018); Mylvaganam, Sassano, and Astolfi (2017) and formation control Mylvaganam and Astolfi (2015b). In this section we focus on MAS as one possible application of the control design machinery considered in this paper. In particular, as in Mylvaganam and Sassano (2018); , we tackle the so-called multi-agent collision avoidance problem and its solution based on a game theoretic framework. Exploiting Theorem 4.1 the multi-agent collision avoidance problem is solved with local performance guarantees, subject to simple and easily satisfied assumptions.

Problem formulation
Consider a system of N agents described by single-integrator dynamics, namelẏ where x i (t) ∈ R 2 and u i (t) ∈ R 2 are the position and control input of the i-th agent, with i = 1, . . . , N . Letx i ∈ R 2 denote the error variable between the current position of the i-th agent and its corresponding target position denoted by x * i ∈ R 2 , i.e.x i = x i − x * i . Suppose that there are m ≥ 0 static obstacles. Each obstacle j is defined by its centre of mass p c j ∈ R 2 and the region of the Euclidean plan which it occupies P j ⊂ R 2 , for j = 1, ..., m. We consider elliptical obstacles, i.e.
where ∂P j denotes the boundary of the region P j , ρ j > 0 and E j = E j > 0. The multi-agent collision avoidance problem consists in steering each agent to its target while avoiding collisions with other agents and with static obstacles. To each agent i we associate a so-called safety radius r i , i = 1, . . . , N , which could, for instance, account for the (possibly heterogenous) physical sizes and shapes of the agents. Considering an agent i, i = 1, . . . , N , its obstacle avoidance region and agent avoidance region are defined as follows.
Definition 5.1. Consider the open sets S j = {x ∈ R 2 : x−p j 2 Ej < ρ 2 j }. The obstacle avoidance region, denoted S, is defined as S = ∪ m j=1 S j .
Definition 5.2. Given a time instantt 0, consider the open sets Dt ij = x ∈ R 2 : x − x j (t) 2 (r i + r j ) 2 , j = 1, ..., N , j = i. The agent avoidance region of the i-th agent att, denoted Dt i , is defined as Dt i = ∪ N j=1,j =i Dt ij .
Definition 5.3. A collision between the i-th agent and a static obstacle is said to occur if there exists a time instantt 0 such that x i (t) ∈ S. The i-th agent is said to collide with the j-th obstacle if there exists a time instantt 0 such that x i (t) − p c j 2 (r i +ρ j (φ(t))) 2 , where 11ρ j (φ) denotes the radius of the ellipse P j in polar coordinates as a function of the angle φ of the segment connecting x i (t) and p c j , relative to the polar description of p c j , i.e. (p c 0,j , φ 0 ).
LetDt i andS denote the complements of the sets Dt i and S, respectively. Then a collision-free trajectory for the i-th agent is defined as follows.
Definition 5.4. The i-th agent is said to be collision-free if x i (t) ∈ Dt i ∪ S for all t 0, or equivalently x i (t) ∈Dt i ∩S, for allt 0.
The collision avoidance problem is defined in the following statement.
Problem 3. Consider a multi-agent system consisting of N > 1 agents with dynamics (43), for i = 1, . . . N . The multi-agent collision avoidance problem consists in determining feedback control strategies u i , i = 1, ..., N , that steer each agent from its initial position to a predefined target while avoiding collisions between agents and with static obstacles.
Consider the following standing assumptions which ensure feasibility of Problem 3.
(2) Agent collision-free initial deployment: the initial positions of the agents satisfy for all i = 1, . . . , N and j = 1, . . . , N , j = i. (3) Obstacle collision-free desired deployment: the target positions of the agents satisfy 11 Given an ellipse P j , the functionρ j (φ) can be computed by straightforward computations, yieldingρ j (φ) = ρ n,j (φ) , where a and b denote the major and minor semiaxis of the ellipse, respectively, and φa is the rotation of the major semiaxis relative to φ 0 . for all i = 1, . . . , N , j = 1, . . . , m.
(4) Agent collision-free desired deployment: the target positions for each agent satisfy for all i = 1, . . . , N and j = 1, . . . , N , j = i. (5) Configuration feasibility: the static obstacles do not form an impermeable boundary about targets of one or more of the agents. Namely, there exists a continuous path 12 l i connecting the initial condition x i (0) and the target x * i satisfying for i = 1, . . . , N .

Solution via a differential game formulation
As detailed in , Problem 3 can be reformulated as a differential game as in Problem 2.
Problem 4. Consider a multi-agent system consisting of N agents with dynamics (43), for i = 1, . . . N , and letx = x 1 , . . . ,x N , that iṡ where B 1 = [I, 0, . . . 0] , . . . , B N = [0, . . . 0, I] andx(0) =x 0 . Problem 3 can be recast as the differential game in Problem 2 with the system dynamics (50) and the individual cost functionals i = 1, ..., N , where q i : R 2N → R, q i (x) > 0, q i (0) = 0, are running costs given by with constants α i > 0, β s i > 0, β d i > 0 and where g s i (x) 0 and g d i (x) 0 are continuously differentiable mappings such that lim x+x * →∂S g s i (x) = +∞ and lim The mappings g s i and g d i are the so-called (static and dynamic, respectively) collision avoidance functions. In the following we consider inverse barrier functions given by the equations It follows by DP arguments that the solution to Problem 4 is obtained by introducing the operators i = 1, . . . , N . In  it has been demonstrated that the matrixvalued functions P i (x) ∈ R 2N ×2N given by where P i kj ∈ R 2×2 , k = 1, . . . N , j = 1, . . . , N and γ i > 0 are constant parameters, and P i kj = 0 for k = i and j = i, for i = 1, . . . , N , constitute an algebraicP solution for the coupled PDEs (54), i = 1, . . . , N . The machinery of Theorem 4.1 can be exploited (taking into consideration that the PDEs (54) are different from (33), i = 1, . . . , N ) to solve Problem 4. To this end, consider the functions (34), i = 1, . . . , N , defined on the augmented state (x, ξ) where ξ ∈ R 2N , with P i given by (55),(56) and R i = R i > 0.
Theorem 5.5 ). Consider the dynamics (43) and the al-gebraicP solution (55)-(56) and suppose that Assumption 5 holds. Then there exist κ > 0, R i , i = 1, . . . , N , and a neighbourhood U ⊆ R 2N × R 2N containing the origin such that for all κ ∈ [κ , ∞) the dynamic control laws with i = 1, . . . , N , are such that (i) the inequalities hold for all (x, ξ) ∈ U ∩ (R 2N × M); (ii) all trajectories of the interconnected closed-loop system (43) Proof: While only a sketch of the proof is provided here, the interested reader is referred to  for more detailed analyses. The proof of the first two claims is similar to the proof of Theorem 4.1. The last claim can be demonstrated by noting that the selection ξ(0) ∈ M is such that P i (ξ(0)) < ∞ which in turn ensures that V i (x(0), ξ(0)) is bounded for i = 1, . . . , N , for all x satisfying Assumption 5. Noting that V i (x(0), ξ(0)) =J i (x(0), ξ(0), u 1 , . . . , u N ) , whereJ i = J i + ∞ 0 −2HJ iκ dt it is clear that if at an instantt > 0 the trajectory ξ(t) leaves the set M, theñ J i (x(0), ξ(0), u 1 , . . . , u N ) becomes unbounded. However since V i (t) < ∞, for all t 0, this cannot occur and it follows that ξ(t) ∈ M for all t > 0.
Theorem 5.5 provides, via the differential game formulation in Problem 4, a local solution to the multi-agent collision avoidance problem, namely Problem 3. The dynamic control laws (57), i = 1, . . . , N , are such that deadlocks are (provably) avoided in a neighbourhood of the origin of the augmented state-space.
Remark 10. While Theorem 5.5 concerns agents described by single-integrator dynamics, the resulting trajectories can be interpreted as a trajectory plan for agents described by more complex dynamics as in Mylvaganam and Sassano (2018).

Simulation
An example consisting of two agents (described by dynamics (43), i = 1, 2) is provided to illustrate the theory developed in the previous subsection. Let ξ = ξ 1 , ξ 2 , where ξ i ∈ R 2 , i = 1, 2. Consider the case in which the initial and target positions of the agents are x 1 (0) = x * 2 = [0, 0] and x 2 (0) = x * 1 = [20, 20] , both agents have the same safety radius r i = 1, i = 1, 2, and there is a circular obstacle centred at p = [6, 4] with radius ρ = 2. The dynamic control laws (57), i = 1, 2 are applied to the agents with k = 1, ξ(0) = [20, −40, −5, 15] and R 1 = R 2 = I. The resulting trajectories of the first (solid, black line) and second (solid, grey line) agent are shown in Figure 1, where the square markers indicate the initial and final positions of the agents and the black shaded area indicates the circular obstacle centered at [6,4] and of radius 2. The dotted lines indicate the safety radii associated with the two agents at certain time instants when the agents are close to the obstacle and star-shaped markers indicate the positions of the agents at these times. The arrows indicate the direction of travel. The time history of the distance between the two agents, i.e. d 12 (t) = x 1 (t) − x 2 (t) , is displayed in the top plot of Figure 2, where the red dotted line indicates the value of r 1 +r 2 , namely the distance below which inter-agent collisions occur. The time histories of the distances between the agents and the obstacle, namely d io = ( x i − p − R) 2 , is displayed in the bottom plot of Figure 2, for i = 1 (black line) and i = 2 (grey line). The red dashed line indicates the value of r i , i = 1, 2 (the distance below which collisions occur with the obstacle). The time histories of the first (solid line) and second (dotted line) components of ξ 1 (top) and ξ 2 (bottom) are displayed in Figure 3.

Conclusion
Dynamic optimisation is considered in this paper. A control design framework based on the notion of algebraicP solution and on the immersion of the nonlinear system into an extended state-space is proposed. The problem of achieving asymptotic stability of an equilibrium is considered before focusing on optimal control problems and differential games. The method presented in this paper allows for the systematic construction of approximate solutions for optimal control problems and differential games and, notably, the control design relies only on the solution of algebraic matrix equations in place of the 'traditional' PDEs. Moreover, the level of approximation can be quantified in terms of an additional cost. A novel approach to minimise this additional cost is proposed. The control design framework is demonstrated on the multi-agent collision avoidance problem.