Singularities and Moments of Nonlinear Systems

The notions of eigenvalue, pole and moment at a pole of a continuous-time, nonlinear, time-invariant system are studied. Eigenvalues and poles are first characterized in terms of invariant subspaces. Tools from geometric control theory are used to define nonlinear enhancements of these notions and to study their relationship with the solution of certain partial differential equations, cascade decompositions and steady-state impulse responses. The theory is illustrated by means of worked-out examples and its significance is demonstrated by solving the model reduction problem by moment matching at poles for nonlinear systems.


I. INTRODUCTION
The goal of this article is to develop a nonlinear enhancement of the notions of eigenvalue, pole and moment at a pole of a continuous-time, linear, time-invariant system. The interest in these notions is motivated by the following observation. The paper [1] has shown that the moments of a system at an ordinary point 1 uniquely determine the steady-state response of the system under the effect of specific inputs, provided that some assumptions hold. Similarly, the paper [3] has shown that the moments at a pole of a linear system uniquely specify the steady-state impulse response of the system (or, equivalently, the free response of the system for a specific initial condition). For linear systems, reduced-order models computed by matching the steady-state impulse response can be used therefore to approximate resonant behaviors, as they approximate the behavior of a system near a singularity of the transfer function. For nonlinear systems, the approximation of resonant behaviors by moment matching thus hinges upon finding appropriate nonlinear counterparts of the notions of eigenvalue, pole and moment at a pole.
The literature abounds with nonlinear counterparts of the notions of eigenvalue and pole. For linear systems, these notions are fundamental tools for analysis and design [4]- [9] and for the solution of a wide A. Astolfi is with the Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, U.K., and with the Dipartimento di Ingegneria Civile e Ingegneria Informatica, Università di Roma "Tor Vergata", 00133, Rome, Italy (e-mail: a.astolfi@imperial.ac.uk).
Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TAC.2019.2951297 1 The terminology is borrowed from [2, p. 861], where a points ∈ C is said to be an ordinary point of a transfer function if the transfer function is analytic in a neighborhood ofs. range of linear and nonlinear control-theoretic problems [4]- [12]. While it is out of the scope of this article to provide a comprehensive overview, we emphasize that several notions of eigenvalues and poles for nonlinear systems have appeared in the literature recently, e.g., in [13]- [21]. For example, an algebraic approach has been used to define a notion of eigenvalue and to study the pole-zero structure of a nonlinear system in [13] and [14]. For linear time-varying systems, eigenvalues and poles have been defined, e.g., in [15] and [16]. In [17] and [18], the Koopman operator has been exploited to analyze spectral properties of nonlinear dynamical systems. Motivated by applications to stability, observability, and contraction analysis, a different notion of eigenvalue for nonlinear systems has been recently investigated in [19]- [21].
This article builds on preliminary results first obtained in [3], [22], and [23]. This article reorganizes our preliminary findings in a systematic fashion and provides detailed proofs of previous results. The starting point of our analysis is the notion of moment at a pole of a linear system introduced in [3], which has been shown to be in one-to-one correspondence with the solution of a "reduced" Sylvester equation [22]. Analogous results are obtained in this article for nonlinear systems by defining nonlinear counterparts of the notions of eigenvalue, pole and moment at a pole. The key ingredients of our analysis are the novel notions of f -reducing manifold and of f -reducing distribution, which together provide a nonlinear counterpart of the notion of A-reducing subspace [24]. The significance of these notions is demonstrated by a model reduction framework, which unifies the approach of moment matching and that of modal approximation [25,Ch. 9]. The theory is illustrated by means of worked-out examples and by approximating forced oscillations in a system.
The rest of this article is organized as follows. Section II provides basic definitions and preliminary results. Section III contains the main results of this article. Section IV develops a model reduction method by moment matching at poles for nonlinear systems. Section V concludes this article with a summary of our results and an outlook to future research.

II. PRELIMINARIES
Consider a continuous-time, single-input, single-output, linear, timeinvariant system described by the equations 2 in which x(t) ∈ C n , u(t) ∈ C, y(t) ∈ C and A ∈ C n×n , B ∈ C n×1 , and C ∈ C 1×n are constant matrices, respectively. Throughout the paper, we assume that the system (1) is minimal, i.e. reachable and observable, and define its transfer function as W (s) = C(sI − A) −1 B. This section recalls basic properties of A-reducing subspaces and their relationship with generalized eigenspaces, cascade decompositions, and moments at a pole of a system [3], [22], [23].
Lemma 1: [24, p. 198] Let A ∈ C n×n . The subspaces S and T are A-reducing if and only if A Π = ΠA, with Π the projector onto S along T .
Lemma 2: [23] Consider system (1) and let s ∈ σ(A). Then the generalized eigenspace E s is the largest A-reducing subspace S of C n such that σ(A| S ) = {s }, with A| S the restriction of A to S. Lemma 3: [23] System (1) has a pole at s ∈ C of order m < r, with r the relative degree of the transfer function of system (1) if and only if dim E s = m and there exist Θ ∈ C n×1 and Γ ∈ C 1×n such that (i) B is the projection of Θ onto E s , (ii) C is the projection of Γ onto E ⊥ s , and (iii) E s is (A − ΘΓ)-invariant. Definition 1: [3] Consider system (1) and let s ∈ C be a pole of order m ∈ Z + of its transfer function. The moment of order −k of system (1) at s is defined as the complex number for every integer 1 ≤ k ≤ m. Theorem 1: [22] Consider system (1) and assume its transfer function has a pole of order m ∈ Z + at s ∈ C. Suppose in which A 11 ∈ C (n−m)×(n−m) is such that s ∈ σ(A 11 ), A 12 ∈ C (n−m)×m is such that rank A 12 = 1 and A 22 ∈ C m×m is a nonderogatory matrix with characteristic polynomial Then there exists a one-to-one correspondence 3 between the moments η −1 (s ), η −2 (s ), . . . , η −m (s ) and the matrix C 1 Π, where Π ∈ C (n−m)×m is the unique solution of the reduced Sylvester equation Theorem 2: [22] Consider system (1) and assume its transfer function has a pole of order m ∈ Z + at s ∈ C + . Suppose (3) holds and let x = col(x 1 , x 2 ) with x 1 ∈ C n−m and x 2 ∈ C m . Assume A 11 ∈ C (n−m)×(n−m) is such that σ(A 11 ) ⊂ C − , A 12 ∈ C (n−m)×m is such that rank A 12 = 1, and A 22 ∈ C m×m is a nonderogatory matrix with characteristic polynomial (4) such that the pair (A 22 , B 2 ) is reachable. Then there exists a one-to-one correspondence between the moments η −1 (s ), η −2 (s ), . . . , η −m (s ) and the steady-state impulse response of the output of system (1).

III. MAIN RESULTS
This section extends the results of the previous section to continuoustime, single-input, single-output, nonlinear, time-invariant systems described by the equations 4 in which x(t) ∈ R n , u(t) ∈ R, y(t) ∈ R and the mappings f : R n → R n , g : R n → R n , h : R n → R are such that f (0) = 0 and h(0) = 0, respectively. All mappings are assumed to be smooth, i.e. infinitely many times differentiable, and system (6) is assumed to be minimal, i.e. strongly observable and accessible [11,Ch. 3].

A. Eigenvalues and Invariance Notions
Definition 2: Consider system (6). The systeṁ with ω(t) ∈ R m and s : R m → R m such that s(0) = 0, is said to be immersed into system (6) if there exists a mapping π : R m → R n , locally defined in a neighborhood of the origin and such that π(0) = 0, which solves the partial differential equation Definition 2 is a modification of the notion of immersion given in [10]. This property can be intuitively interpreted as a notion of "projection" of a system onto another system and has been used for diverse applications in control theory [10], [11], [28]. Note that Definition 2 formalizes a property of systems described only by an ordinary differential equation, while in [10] the notion of immersion is given for systems with outputs.
Definition 3: Consider system (6). The manifold with π : R m → R n such that π(0) = 0, is said to be an f -reducing manifold if f is tangent to M at the origin and the following hold.
(i) There exists a nonsingular, involutive, f -invariant distribution Δ defined in a neighborhood U of the origin. (ii) For every x ∈ U , the condition Δ(x) ⊕ T x M = T x R n holds. (iii) The restriction of system (6) to the manifold M is immersed into system (6), i.e. (8) holds for some vector field s. Definition 3 can be seen as a nonlinear counterpart of the notion of A-reducing pair of subspaces and can be interpreted geometrically as follows. Condition (i) guarantees the existence of a partition of every sufficiently small neighborhood of the origin into leaves of the foliation induced by the distribution Δ. Condition (ii) is a transversality condition, which ensures that the tangent space at the origin of the ambient manifold R n is decomposed into the tangent space of the manifold M at the origin and the tangent space of the maximal integral submanifold N of Δ passing through the origin (see Fig. 1). Finally, condition (iii) is an immersion condition, which ensures that M is (locally) an invariant manifold.
Remark 1: While the notion of A-reducing subspace is inherently symmetric, the roles of the manifold M and of the distribution Δ in Definition 3 are not interchangeable. This is because the notions of invariance of a manifold and of a distribution do not coincide for nonlinear systems.
Any f -reducing manifold of system (6) can be also characterized as follows. Let U be an open subset of R n containing the origin and let Δ be a nonsingular, involutive, f -invariant distribution on U . Then there exist (local) coordinates x = col(x 1 , and n − m the dimension of Δ on U , in which the vector field f reads as and every vector field θ ∈ Δ is of the form be tangent to f at the origin. In the above coordinates, the transversality condition (ii) in Definition 3 is satisfied if and only if the (well-defined) restriction of the vector field f to the manifold M coincides with the vector field f 2 . With these premises, the manifold M is f -reducing if and only if the mapping π solves the partial differential equation We are now in the position to give the following definition. Definition 4: Consider system (6), system (7) and the manifold M defined in (9). The vector field s : R m → R m is said to be an eigenvalue of system (6) (with respect to M) if M is the largest 5 f -reducing manifold such that the restriction of f to M coincides with s. Definition 4 is motivated by Lemma 2 and can be seen as a nonlinear counterpart of the notion of the eigenvalue of a linear system. It formalizes the idea that an eigenvalue corresponds to the largest f -reducing manifold M and the (well-defined) restriction of system (6) to the manifold M is a copy of the dynamics of system (7). Similarly, the manifold M can be intepreted as the "generalized eigenmanifold" associated with the eigenvalue s.
Remark 2: Definition 4 implies that the vector field f is an eigenvalue of system (6) (take M = R n and Δ = {0}). For this reason, f is referred to as a trivial eigenvalue, while all other eigenvalues are referred to as nontrivial.
For a linear system the existence of an eigenvalue corresponds to the solvability of a "reduced" Sylvester equation [7, p. 21]. Similarly, every nontrivial eigenvalue of a nonlinear system implies the solvability of a "reduced" partial differential equation.
Theorem 3: Consider system (6). The vector field s : R m → R m is a nontrivial eigenvalue of system (6) if and only if, with respect to the decomposition (10), there exists a mapping π : R m → R n−m , locally defined in a neighborhood of the origin and such that π(0) = 0, which solves the partial differential equation To illustrate the ideas introduced above, we consider a simple example first studied in [29, p. 14]. Example 1: Consider the planar systeṁ The system possesses a unique hyperbolic equilibrium point at the origin. The (global) unstable manifold passing through the origin of the system (13) We show that M is an freducing manifold. Let Δ = span{ e 1 } and note that Δ is non-singular, involutive and f -invariant. Direct inspection shows that the integral submanifolds of the distribution Δ are lines that are parallel to the x 1 axis. In particular, the maximal integral submanifold N of Δ passing through the origin coincides with the (global) stable manifold passing through the origin, i.e. N = { x ∈ R 2 : x 2 = 0 }. Fig. 2 displays the phase portrait of the system (13), with the stable manifold N (dotted) and the unstable manifold M (dashed). The tangent spaces of the manifolds M and N at the origin are T 0 M = span{ e 2 } and T 0 N = span{ e 1 }, respectively. In accordance with the stable manifold theorem [29, p. 14], these are complementary subspaces of the tangent space of the ambient manifold. Note also that the manifold M can be described as the graph of the mapping π : R → R, defined as π(ω) = ω 2 3 for every ω ∈ R, which can be found by eliminating time in (13) and solving the ordinary differential equation Thus, the restriction of system (13) to M is immersed into system (13), as the partial differential equation −π + ω 2 = ∂π ∂ω ω has a unique solution. This shows that M is an f -reducing manifold. We conclude that M is the largest f -reducing manifold with these properties. By Definition 4, this implies that the vector field s : R → R, defined as s(ω) = ω for every ω ∈ R, is a nontrivial eigenvalue of the system (13).
Example 1 raises a natural question: is the vector field f 1 : R → R, defined as f 1 (ω) = −ω for every ω ∈ R, a nontrivial eigenvalue of the system (13)? The answer is affirmative and this can be verified as follows. With the aid of a symbolic software package (such as Maple or MuPAD), the partial differential equation can be seen to admit a solution φ 1 : R 2 → R such that ∂φ 1 ∂x 1 is not identically zero. This implies that the function φ : for every x ∈ R 2 , qualifies as a (local) change of coordinates around the origin. In these coordinates, the system (13) is described by the equationsξ 1 = −ξ 1 andξ 2 = ξ 2 , with ξ = col(ξ 1 , ξ 2 ) ∈ R 2 , from which one infers that the vector field f 1 is a nontrivial eigenvalue of the system.
The aforementioned argument can be generalized to show that a system described by equations of the forṁ in which x = col(x 1 , x 2 ), with x 1 ∈ R n−m and x 2 ∈ R m , and f 2 is a nontrivial eigenvalue, also possesses the nontrivial eigenvalue f 1 provided that the partial differential equation admits a "full rank" solution, as detailed in the following statement. Theorem 4: Consider system (15). Assume f 2 is a nontrivial eigenvalue of system (15) and the partial differential equation (16) admits a solution φ 1 such that the matrix ∂φ 1 ∂x 1 is full rank in a neighborhood of the origin. Then f 1 is a nontrivial eigenvalue of system (15).
Proof: Under the stated assumptions the mapping φ : R n → R n , defined as φ(x) = col(φ 1 (x), x 2 ) for every x ∈ R n , qualifies as a (local) change of coordinates around the origin. In the new coordinates, system (15) is described by the equationsξ 1 = f 1 (ξ 1 ) andξ 2 = f 2 (ξ 2 ), which shows that f 1 is a nontrivial eigenvalue of the system.
Remark 3: The existence of multiple nontrivial eigenvalues in a system can be also characterized in geometric terms using simultaneously integrable distributions [30]. For example, the existence of nonsingular, involutive, simultaneously integrable distributions Δ M and Δ N such that dim Δ M + dim Δ N = n and such that [θ 1 , θ 2 ] = 0 for every θ 1 ∈ Δ M and θ 2 ∈ Δ N implies the existence of a "full rank" solution of (16). By Theorem 4, this implies that if f 2 is a nontrivial eigenvalue of the system (15), then so is f 1 .
in which S(t) ∈ R + is the number of individuals susceptible to infection, I(t) ∈ R + is the number of infected individuals able to transmit the disease to other individuals, R(t) ∈ R + is the number of infected individuals who have successfully recovered, ∈ R + is the infection rate and a ∈ R + is the removal rate of infectives. Note that the number of individuals is constant over time, sinceṠ +İ +Ṙ = 0. The number of infected individuals who have fully recovered can be studied setting x 1 = R, x 21 = S, and x 22 = I, and rewriting (17) aṡ We show that the vector field s : R 2 → R 2 , defined as for every x 2 = col(x 21 , x 22 ) ∈ R 2 , is an eigenvalue of the system (18). Note that the system (18) admits an obvious cascade decomposition. As a result, we only need to verify that the partial differential equation admits a solution. Recall that the total number of individuals N 0 ∈ Z + is constant over time, i.e.
is therefore an invariant manifold for the system (18). This implies that the mapping π : R 2 → R, defined as π(x 2 ) = N 0 − x 21 − x 22 for every x 2 ∈ R 2 , solves (20). This proves that s is an eigenvalue of system (18).

B. Poles and Invariance Notions
We now provide a nonlinear enhancement of the notion of pole of a linear system. To this end, the notions of cascade decomposition and of f -reducing distribution are introduced.
Definition 5: The system (6) is said to admit a cascade decomposition if there exist (local) coordinates x = col(x 1 , x 2 ), with x 1 ∈ R n−m and x 2 ∈ R m , in which the system (6) is described by the following equationṡ where f 1 : R n → R n−m , f 2 : R m → R m , g 2 : R m → R m , and h 1 : R n−m → R are such that f 1 (0, 0) = 0, f 2 (0) = 0, and h 1 (0) = 0, respectively. Definition 6: Consider system (6). A nonsingular, involutive distribution Δ M locally defined around the origin is said to be f -reducing if its maximal integral submanifold M is f -reducing. The pair of distributions Δ M and Δ N is said to be f -reducing if Δ M and Δ N are f -reducing and complementary in a neighborhood U of the origin, i.e. such that Δ M (x) ⊕ Δ N (x) = T x R n for every x ∈ U .
Definition 6 formalizes the idea that if an f -reducing manifold M is the maximal integral submanifold of a distribution Δ M , then the distribution should be called f -reducing too. Similarly, a pair of distributions Δ M and Δ N is referred to as f -reducing if Δ M and Δ N are both f -reducing and (locally) complementary, in analogy with the linear case. We now provide conditions for a nonlinear system to admit a cascade decomposition. As a direct generalization of the assumptions of the linear case [23,Th. 2], this requires the existence of two nonsingular, involutive complementary distributions Δ M and Δ N defined around the origin, for which the following assumptions hold.
Assumption 1: There exist mappings θ : R n → R n and γ : R n → R such that the following conditions hold. ( . Assumption 2: There exist mappings γ 1 : R n−m → R and γ 2 : R m → R such that the following conditions 6 Theorem 5: Consider system (6) and suppose Assumption 1 holds. Then system (6) admits a cascade decomposition of the form (21). Moreover, if Assumption 2 also holds, then system (6) admits a cascade decomposition of the forṁ Proof: By condition (i) of Assumption 1, the distribution Δ N is invariant under the dynamics 7 of the systemχ = f (χ) + θ(χ)u, with χ(t) ∈ R n and u(t) ∈ R. This implies that there exist (local) coordinates x = col(x 1 , x 2 ), with x 1 ∈ R n−m , x 2 ∈ R m , and n − m the dimension of Δ N in a neighborhood of the origin, in which the vector fields f and θ can be written as Condition (ii) of Assumption 1 implies that, in the above coordinates, the vector field g can be written as g( By condition (iv) of Assumption 2 and by the minimality of system (6), this implies ∂θ 1 ∂x 2 = 0. Thus there existθ 1 : R m → R m such that θ 1 (x) =θ 1 (x 1 ) for every x = col(x 1 , x 2 ) ∈ R n locally around the origin. By conditions (v) and (vi) of Assumption 2, the distribution Δ M is (f − θγ 2 )-invariant and hence Putting things together yields and h(x) = γ 1 (x 1 ), from which the second claim follows.
We are now ready to give the following definition. Definition 7: Consider system (6), system (7) and the manifold M defined in (9). Let Δ M and Δ N be f -reducing distributions that satisfy Assumption 1 and let M be the maximal integral submanifold of Δ M . The vector field s : R m → R m is said to be a pole of system (6) if s is a nontrivial eigenvalue of system (6) (with respect to M). Definition 7 is motivated by Lemma 3 and can be seen as a coordinate-free version of the notion of a pole introduced in [23]. Note that, as a direct consequence of Definition 7, if s : R m → R m is a pole of system (6), then the system admits a cascade decomposition. Note also that if the vector field s : R m → R m is a nontrivial eigenvalue of system (6) and Δ M and Δ N are corresponding f -reducing distributions, then the distribution Δ N is, by definition, f -invariant. Thus the condition [f, Δ N ] ⊂ Δ N is always satisfied and s is a pole of the system if and only if there exist a vector field θ : R n → R n and a function γ : We now illustrate the notion of pole introduced in Definition 7 with a simple worked-out example.
Example 3: Consider the systeṁ in which x(t) ∈ R 2 , u(t) ∈ R, and y(t) ∈ R. Note that system (13) is the autonomous system associated with system (23). Thus in view of Example 1 the vector field s : R → R, defined as s(ω) = ω for every ω ∈ R, is a nontrivial eigenvalue of system (23). The purpose of this example is to show that the vector field s is a pole of system (23). To this end, consider the distributions Δ M = span{ e 2 } and Δ N = span{ e 1 } and note that these are f -reducing at the origin, by Example 1. Thus we only need to show that Assumption 1 holds. Define θ : R 2 → R 2 as θ(x) = e 1 + e 2 and γ : R n → R as γ(x) = x 1 + x 2 2 for every x ∈ R 2 , respectively. A direct computation gives [f, e 1 ] = e 1 and [θ, e 1 ] = 0. This implies that condition (i) of Assumption 1 is satisfied. On the other hand, conditions (ii) and (iii) of Assumption 1 are seen to hold by the direct inspection. We conclude that s is indeed a pole of system (23). Finally, we emphasize that Assumption 2 can be also seen to hold by direct computation and that, in agreement with Theorem 5, system (23) is described by equations of the form (22).
The existence of multiple poles in a system can be characterized in geometric terms much in the same way as the existence of multiple nontrivial eigenvalues. For example, system (6) possesses two distinct poles if there exist simultaneously integrable, f -reducing distributions Δ M and Δ N such that the system (6) can be written, with respect to the decompositions induced by these distributions, both as (22) and asż 1 = f 1 (z 1 ) +g 1 (z 1 )u, respectively. We emphasize that this is not always possible in general. For example, consider the system (23) and note that it is described by equations of the form (22). Moreover, on the basis of Example 3, the vector field s : R → R, defined as s(ω) = ω for every ω ∈ R, is a pole of system (23). However, the vector field τ : R → R, defined as τ (ω) = −ω for every ω ∈ R, is not a pole of the system, since (23) cannot be described by equations of the form (24). This can be seen as follows. If the system could be described by equations of the form (24), then there would exist a (local) change of coordinates z = φ(x) and functions ψ f : R 2 → R and ψ g : from which a contradiction immediately follows.

C. Moments at a Pole and Steady-State Impulse Responses
Definition 8: Consider system (6) and suppose (21) holds with respect to the (local) coordinates x = col(x 1 , x 2 ), with x 1 ∈ R n−m and x 2 ∈ R m . Let s : R m → R m be a pole of system (6) described in (local) coordinates by f 2 . The moment of system (6) at s is defined as the function h 1 • π, with π the unique solution of (11).
In analogy with the linear case, the moment at a pole uniquely determines the steady-state impulse response of the output of a nonlinear system, provided that some assumptions hold.
Theorem 6: Consider system (6) and suppose (21) holds with respect to the (local) coordinates x = col(x 1 , x 2 ), with x 1 ∈ R n−m and x 2 ∈ R m . Let s : R m → R m be a pole of system (6) described in (local) coordinates by f 2 . Assume the equilibrium at the origin of the systemẋ 1 = f 1 (x 1 , 0) is locally exponentially stable, the systeṁ is Poisson stable and the pair (f 2 , g 2 (0)) is exciting 8 . Then the moment of system (6) at s uniquely determines the steady-state impulse response of the output of system (6).
Proof: The pole s of system (6) is well-defined, since under the stated assumptions there exists a (well-defined) center manifold [34] described by the equation Differentiating ε with respect to time and using (11) yieldṡ ε = f 1 (ε + π(x 2 ), x 2 ) − f 1 (π(x 2 ), x 2 ) − L g 2 π(x 2 )u, which, owing to u = δ 0 and the local exponential stability assumption, implies lim t→∞ ε(t) = 0. This shows that the (center) manifold M = { x ∈ R n : x 1 = π(x 2 ), x 2 ∈ R m }, with π the unique solution of (11), is an attractive invariant manifold of the system (1). Thus the steady-state impulse response of system (1) is well-defined and the output of the system is y(t) = h 1 (π • x 2 (t) + ε(t)), in which ε is an exponentially decaying function. Thus the steady-state impulse response of the output of system (6) is y ir (t) = h 1 • π • x 2 (t) for t > 0. Hence, in view of Definition 8, the claim is proved.
Remark 4: The proof of Theorem 6 assumes implicitly that the input u = δ 0 is such that the corresponding initial condition x 2 (0) belongs to the basin of attraction of the equilibrium at the origin. If that is not the case, it suffices to rescale the "amplitude" of the input.
Remark 5: Theorem 6 provides sufficient conditions for the moment of a system to uniquely determine the steady-state impulse response of the system. However, this situation can be encountered under different circumstances and, in fact, different definitions of moment can be given. This issue is extensively discussed in [35].
Remark 6: If the relative degree of system (6) is not sufficiently large, then it is impossible to realize the system as a cascade of two strictly proper systems. In this case, however, the moments at a pole can be still characterized using a cascade of singular systems and combining the results of this article with those of [36].

IV. MODEL REDUCTION BY MOMENT MATCHING AT POLES
We now provide a nonlinear counterpart of the results established for linear systems in [22]. Definition 9: Consider system (6) and suppose (21) holds with respect to the (local) coordinates x = col(x 1 , x 2 ), with x 1 ∈ R n−m and x 2 ∈ R m . Let s : R m → R m be a pole of system (6) described in (local) coordinates by f 2 . Assume the equilibrium at the origin of the systemẋ 1 = f 1 (x 1 , 0) is locally exponentially stable, the systeṁ x 2 = f 2 (x 2 ) is Poisson stable and the pair (f 2 , g 2 (0)) is exciting. The systemξ with ξ(t) ∈ R m , u(t) ∈ R, and ψ(t) ∈ R, is said to be a reducedorder model at s of the system (6) if there exists an invertible mapping p : R m → R m , locally defined in a neighborhood of the origin, with p(0) = 0, such that where π is the unique solution of the partial differential equation (11). Definition 9 is motivated by the following result. Proposition 1: Consider system (6) and suppose (21) holds with respect to the (local) coordinates x = col(x 1 , x 2 ), with x 1 ∈ R n−m and x 2 ∈ R m . Let s : R m → R m be a pole of system (6) described in (local) coordinates by f 2 . Suppose system (26) is a reduced-order model of the system (6) at s. Then the system (26) matches the moment of the system (6) at s.
Proof: By hypothesis, (26) is a reduced-order model of system (6) at s and thus, by definition, there exists an invertible mapping p : R m → R m , locally defined in a neighborhood of the origin such that p(0) = 0, and (27) holds, where π is the unique solution of the partial differential equation (11). This implies that p qualifies as a (local) change of coordinates. Setting ζ = p(ξ) system (26) can be rewritten asζ = f 2 (ζ) + g 2 (ζ)u, ψ = h 1 • π(ζ). By Definition 7, this shows that the reduced order model (26) has a pole at s. Moreover, the steady-state response of the output of the system (1) and of the reduced-order model (26) with the input u = δ 0 coincide. This, in view of Theorem 6, proves the claim.
The model reduction framework and the coordinate-free characterization of eigenvalues and poles presented in this article can be extended, e.g., to singular systems [37] using the notion of deflating subspace [38], a generalization of the notion of A-reducing subspace, which has been employed for model reduction purposes in [39]. Extensions to other classes of systems, including time-delay systems [40] and systems described by integro-differential equations [41], are also possible. A nonlinear counterpart of the notion of reduced-order model achieving moment matching simultaneously at ordinary points and at poles can be also defined.

A. A Family of Reduced-Order Models
Selecting p(ξ) = ξ in (27) yields a reduced-order model of system (1) at s described by equations of the form (26), with φ(ξ) = f 2 (ξ), γ(ξ) = g 2 (ξ), and κ(ξ) = h 1 • π(ξ). Observe that the reduced-order model obtained has a pole at s and that the reduced-order models achieves moment matching at s. Pushing the analogy with the linear case, the approach described above can be considered as a nonlinear counterpart of that of modal approximation [25,Ch. 9]. The following example illustrates the proposed model reduction procedure.
Example 4: Consider the systeṁ in which x(t) ∈ R 3 , u(t) ∈ R, y(t) ∈ R, and Ω ∈ R + . The system possesses a unique equilibrium point at the origin and an inherent cascade decomposition. Moreover, the partial differential equation admits the closed-form solution Observe that the function π is continuous on the set S = { ω ∈ R 2 : ω 2 = 0 }. Note, however, that S is not an invariant set of system (28) and that setting u = δ 0 yields x 3 (0) = 1. This shows that the vector field s : R 2 → R 2 , defined as s(ω) = [ Ωω 2 − Ωω 1 ] for every ω ∈ R 2 , is a pole of system (28). Note also that system (28) satisfies the assumptions of Theorem 6. The moment of the system at s is and a reduced-order model achieving moment matching at s iṡ The steady-state impulse response of the reduced-order model (32) is well-defined, since along the solutions of the reduced-order model one has arctan( ξ 1 ξ 2 ) = Ωt for every t ∈ R + . Simulations have been run using an explicit Runge-Kutta (2,3)-order integration method and selecting the parameter Ω = 5. The initial conditions of system (28) and of the reduced-order model (32) have been selected as x(0) = [ 1 0 0 ] and as ξ(0) = [ 0 0 ] , respectively. Fig. 3 displays the output of system (28) and of the reduced-order model (32) with the input u = δ 0 (top) and the absolute value of the approximation error (bottom). Note that the time histories of the output of system (28) and of the reduced-order model (32) are almost overlapped and that the approximation error converges to zero exponentially.

B. Forced Oscillations and Reduced-Order Models
The analysis of forced oscillations in a system is a classical topic in systems theory [29], [42]. The purpose of this section is to show that, borrowing ideas from [1], this phenomenon can be approximated with arbitrary accuracy and without any assumption on the solution of the system (such as periodicity) by computing an appropriate reduced-order model.
Consider system (6) and assume that its input is generated by the steady-state impulse response of a system of the forṁ in which ω(t) ∈ R m , v(t) ∈ R, ς(t) ∈ R, and S ∈ R m×m , M ∈ R m , and L ∈ R 1×m are constant matrices such that S + S = 0 and  (38) and of the approximate reduced-order model (41) with input u = δ 0 : y (solid), ψ [1] (dash-dotted) and ψ [3] (dotted). Bottom: Time histories of the absolute value of the approximation errors y − ψ [1] (dash-dotted) and y − ψ [3] (dotted).
system (38) satisfies the assumptions of Theorem 6. A reduced-order model of the system (38) achieving moment matching at s iṡ in which π is the unique solution of (39). Note that it is not possible to compute the solution of (39) explicitly. However, an approximate reduced-order model of the system (38) at s is defined by the equationṡ with ψ [j] the Taylor approximation of π of order j about the origin.
Simulations have been run to assess the properties of the approximate reduced-order model (41). The system (38) and the approximate reduced-order model (41) have been numerically integrated using an explicit Runge-Kutta (2,3)-order integration method selecting the parameters α = 1, β = 1, γ = 1, δ = 1, and Ω = 5. The initial conditions of system (38) and of the approximate reduced-order model (41) have been selected as x(0) = [ 3.5 0 0 0 ] and as ξ(0) = [ 0 0 ] , respectively. Fig. 4 (top) displays the steady-state impulse output response of system (38) and of the reduced-order model (41) obtained by truncating the Taylor series expansion defining ψ to the first and third order, respectively. Fig. 4 (bottom) displays the absolute value of the approximation errors y − ψ [1] and y − ψ [3] , respectively. Note that, at steady-state, max |y − ψ [1] | = 0.4287 > max |y − ψ [3] | = 0.2874 and that the approximation error decreases by adding terms in the Taylor series expansion defining the output of the approximate reduced-order model.
Remark 7: There exist several methods and tools to approximate the behavior of forced oscillations in a system, including the harmonic balance method [42], perturbation methods [42], and homotopy analysis [44]. In contrast with these methods, the use of (approximate) reduced-order models does not require any assumption on the structure of the solution of the system. Moreover, the behavior of forced oscillations in a system can be determined up to arbitrary precision analogous to the use of Fourier series expansions, which, however, requires the knowledge of the period of the solution (which may not even be periodic). One limitation of our method, on the other hand, is that it can be storage expensive for large-scale systems.

V. CONCLUSION
Geometric control theory has been used to define a nonlinear enhancement of the notions of eigenvalue, pole and moment at a pole. These notions have been linked to solutions of "reduced" partial differential equations, cascade decompositions and steady-state impulse responses. The theory developed has proved instrumental in posing and solving the model reduction problem by moment matching at poles for nonlinear systems. Families of reduced-order models achieving moment matching at a pole have been obtained, thus providing a nonlinear counterpart of modal approximation.