Moments of Random Variables: A Systems-Theoretic Interpretation

Moments of continuous random variables admitting a probability density function are studied. We show that, under certain assumptions, the moments of a random variable can be characterized in terms of a Sylvester equation and of the steady-state output response of a specific interconnected system. This allows to interpret well-known notions and results of probability theory and statistics in the language of systems theory, including the sum of independent random variables, the notion of mixture distribution and results from renewal theory. The theory developed is based on tools from center manifold theory, the theory of the steady-state response of nonlinear systems, and the theory of output regulation. Our formalism is illustrated by means of several examples and can be easily adapted to the case of discrete and multivariate random variables.


I. INTRODUCTION
One of the most important and challenging problems in probability theory and statistics is that of determining the probability distribution of the random process that is thought to have produced a given set of data. Without making any assumptions on the underlying random process, this problem is extremely difficult in general. In parametric statistics [1 -4], where the probability distribution which generates the given data is assumed to belong to a family of probability distributions parameterized by a fixed set of parameters, the problem can be dealt with using different approaches. A classical way [1 -4] to solve this problem is to find mathematical objects that, under specific assumptions, uniquely identify a probability distribution in the family. In probability theory [5 -8], a similar necessity arises when one is confronted with the problem of specifying uniquely a probability measure through a sequence of numbers. The determination of simple, yet meaningful, objects with these features is therefore of paramount importance. The significance of the moments of a random variable in this context is comparable to that of the derivatives (at a point) for an analytic function. The representation of an analytic function through its Taylor series [9] allows to know the whole function once all the derivatives (at a point) are specified. Similarly, the set of all moments of a random variable uniquely identifies the probability distribution of the random variable, provided that certain conditions are satisfied. This means that the essential features of a random variable can be captured by its moments, which can be then used as an alternative description of the random variable.
Moments have been used in a number of different contexts in probability theory, including the Stieltjes moment problem [10], the Hamburger moment problem [11], the Hausdorff moment problem [12] and the Vorobyev moment problem [13]. Moments have proved instrumental in the first rigorous (yet incomplete) proof of the central limit theorem [14], but also in the proof of the Feynman-Kac formula [15], in the characterisation of the eigenvalues of random matrices [16], and in the study of birthdeath processes [17]. The correspondence between moments and probability distributions is also exploited in statistics to fit curves and to design parameter estimation procedures [1 -4]. For example, the method of moments introduced in [18] takes advantage of this correspondence to build consistent (usually biased) estimators. Approximations of probability density functions may be computed exploiting such correspondence through Pearson's and Johnson's curves [2]. A revisitation of this correspondence has also led to the exploratory projection pursuit [19 -21], a graphical visualisation method for the interpretation of high-dimensional data. While this list of examples is far from being exhaustive, it illustrates the central role of moments in a number of interesting problems. Further detail on theory and applications of moments can be found, e.g., in [22 -24].
This work represents a step towards bridging the gap between the notions of moment in probability theory and in systems theory. Connections are first established between moments of random variables and moments of systems [25]. These connections are then used to support the claim that a system can be seen as an alternative, equivalent description of the probabilistic structure of a random variable. This, in turn, indicates that systems-theoretic techniques and tools can be used to revisit and shed new light on classical results of probability theory and statistics.
The benefits of a dialogue between the two areas of research has been highlighted, e.g., in [26] and briefly mentioned in [27]. The literature on the analysis of results of probability theory in systems-theoretic terms, however, appears to be limited to specific topics and scattered across different research fields.
Linear systems theory has allowed to reinterpret basic results of probability theory for probability density functions with rational Laplace transform, e.g., in [28 , 29]. Concepts and tools of systems theory have been used to investigate phasetype distributions [30 , 31] and matrix exponential distributions [32]. The study of the properties of these distributions have grown into a well-established area of research which goes under the name of "matrix analytic methods" [33 , 34]. Another well-known class of models which has attracted the interest of systems theorists over the past century is that of Markov chains [5 , 6 , 8 , 35 -37]. One of the reasons for this interest is that these models can be regarded as a special class of positive systems [38], which can be studied exploiting the theory of nonnegative matrices [39 -42]. Queuing systems have been studied borrowing analysis and design tools from systems theory, e.g., in [43 , 44], whereas a systems-theoretic approach has been adopted to study risk theory, e.g., in [45 , 46]. In circuits theory, moments have been used for the approximation of the propagation delay in a RC tree network by interpreting the impulse response of a circuit as a probability density function [47]. Finally, we emphasise that a number of fundamental tools for modelling, estimation and prediction may not be sharply categorised as belonging to the sphere of probability theory or to that of systems theory, but rather lie at the intersection of these areas of research. The Kalman filter [48 , 49] is perhaps the most successful example of exchange of ideas between probability theory and systems theory, but several other examples can be found (see, e.g., [50 -56] and references therein).
The main objective of this work is to establish a one-to-one correspondence between moments of random variables and moments of systems [25]. This goal is readily achieved by interpreting probability density functions as impulse responses whenever these can be realized by linear time-invariant systems. The situation is more delicate when a probability density function can be only realized, e.g., by means of a linear time-varying system or a system in explicit form [57 , 58], for which the interpretation in terms of impulse responses does not possess a direct counterpart. This issue is overcome exploiting recent developments in systems theory [58 -63], where the moments of a linear system have been characterised as solutions of a Sylvester equation [59 , 60] and, under certain hypotheses, as steady-state responses of the output of particular interconnected systems [61 -63]. The characterisation of the moments of a system in terms of steady-state responses is based on tools arising in the center manifold theory [64], the theory of the steady-state response of a nonlinear system [65], and the output regulation theory for nonlinear systems [66]. Existing results on moments of linear and nonlinear systems [61 -63] are extended and the notion of moment for systems which only possess a representation in explicit form is revisited [58]. As a direct by-product of our approach, Sylvester equations and Sylvester-like differential equations may be interpreted as powerful computational tools which allow to calculate moments of random variables. Our approach allows to revisit and re-interpret well-known notions and results of probability theory and statistics using the language of systems theory, including the sum of independent random variables [6 , 8 , 67], the notion of mixture distribution [7 , 68], and results from renewal theory [6 , 7 , 69 , 70]. Given the conceptual nature of the paper, we focus on linear systems, linear time-delay systems and systems in explicit form to streamline the exposition, although generalisations to further classes of systems are possible. The formalism is developed for univariate continuous random variables, but can be easily extended to discrete and multivariate random variables.
Preliminary results have been presented in [71], where a first connection between moments of probability density functions and moments of linear time-invariant systems has been established. As a result, the moment generating function of a random variable and the solution of a Sylvester equation have been shown to be closely related. The present manuscript extends our results to probability density functions which may not be described by means of a linear time-invariant system. To this end, the notion of moment of a system is revisited and extended to systems the impulse response of which is defined on the whole real line or on a compact subset of the real line. The theory is supported by several worked-out examples and applications to identifiability, queueing theory and renewal theory.
The rest of the paper is organized as follows. Section II provides basic definitions and the necessary background concerning moments of linear systems, of time-delay systems, of systems in explicit form and of random variables. Section III contains our main results and includes a characterisation of the moments of a random variable admitting a (linear or explicit) realization in terms of the systems-theoretic notion of moment. The moments of probability density functions defined on the real axis as well as probability density functions with compact support are also characterised by means of Sylvester equations and steady-state responses. Section IV provides selected applications of the theoretical results developed, including the sum of independent random variables, the notion of mixture distribution, and results from renewal theory. Finally, conclusions and future research directions are outlined in Section V.
Notation: Z ≥0 and Z >0 denote the set of non-negative integer numbers and the set of positive integer numbers, respectively. R, R n and R p×m denote the set of real numbers, of n-dimensional vectors with real entries and of p × mdimensional matrices with real entries, respectively. R ≥0 and R >0 denote the set of non-negative real numbers and the set of positive real numbers, respectively. C denotes the set of complex numbers. C <0 denotes the set of complex numbers with negative real part. i denotes the imaginary unit. I denotes the identity matrix. e j denotes the vector with the j-th entry equal to one and all other entries equal to zero. σ(A) denotes the spectrum of the matrix A ∈ R n×n . M ′ denotes the transpose of the matrix M ∈ R p×m . x denotes the standard Euclidean norm of the vector x ∈ R n while, with some abuse of notation, A denotes the induced norm of the matrix A ∈ R n×n . ∆ k denotes, for all k ∈ Z ≥0 , the standard k-simplex, i.e. ∆ k = w ∈ R k+1 ≥0 : w 1 + . . . + w k+1 = 1 . diag(λ) denotes the diagonal matrix whose diagonal elements are the entries of the vector λ ∈ R n . J 0 denotes the matrix with ones on the superdiagonal and zeros elsewhere. J λ denotes the Jordan block associated with the eigenvalue λ ∈ C, i.e. J λ = λI + J 0 . k! and (k)!! denote the factorial and the double factorial of k ∈ Z ≥0 , respectively.ẋ denotes the time derivative of the function x, provided it exists. f 1 * f 2 denotes the convolution of the functions f 1 and f 2 . L{f } denotes the bilateral Laplace transform of the function f . δ 0 and δ −1 denote the Dirac δ-function and the right-continuous Heaviside unit step function, respectively. The time reversal of the function f : R → R is defined as t → f (−t). δ −1 denotes the right-continuous time reversal of δ −1 . 1 S denotes the indicator function of the subset S of a set X , i.e. the function 1 S : X → {0, 1} defined as 1 S (x) = 1 if x ∈ S and as 1 S (x) = 0 if x ∈ S. By a convenient abuse of notation, 1 also denotes the vector the elements of which are all equal to one. The translation of the function f : denotes the expectation of the random variable X.

II. PRELIMINARIES
This section contains a short digression on the notion of moment, which is used with different meanings in the literature and throughout the paper. To give a compact and self-contained exposition, we first provide some material on the notion of moment of a linear system [25 , 59 -62]. We then illustrate and extend notions and results concerning moments of a system in explicit form [58] and of a time-delay systems [72]. Finally, we recall the notion of moment of a random variable [73].

A. Moments of a linear system
Consider a single-input, single-output, continuous-time, linear, time-invariant system described by the equationṡ in which x(t) ∈ R n , u(t) ∈ R, y(t) ∈ R and A ∈ R n×n , B ∈ R n×1 and C ∈ R 1×n are constant matrices. Throughout the paper we assume that the system (1) is minimal, i.e. reachable and observable, and let W (s) = C(sI − A) −1 B be its transfer function.
The moment of order zero of system (1) at The moments of system (1) can be characterised in terms of the solution of a Sylvester equation [59 -62]. The following statements provide a version of the results presented in [61 , 62].

B. Moments of a time-delay system
Consider a single-input, single-output, continuous-time, linear, time-delay system with discrete constant delays described by equationṡ 1 A signature matrix is a diagonal matrix with ±1 on the diagonal [74]. 2 A matrix is non-derogatory if its characteristic and minimal polynomials coincide [41]. 3 The notion of steady-state response is taken from [75] (see also [65 , 66 , 76]). Note that if system (1) is stable and the signal u is bounded backward and forward in time, system (1) has a unique, well-defined steadystate response. 4 The terminology is borrowed from [61 , 62]. By one-to-one correspondence we mean that the steady-state response of the output d is uniquely determined by the moments and vice versa.
and A j ∈ R n×n , B j ∈ R n×1 and C j ∈ R 1×n are constant matrices for 0 ≤ j ≤ ς and ς + 1 ≤ j ≤ ̺, respectively. Throughout the paper the system (8) is assumed to be minimal, i.e. reachable and observable, and let W (s) = C(s)(sI − A(s)) −1 B(s) be its transfer function, with A(s) = ς j=0 A j e −τj s , B(s) = ̺ j=ς+1 B j e −τj s and C(s) = ς j=0 C j e −τj s . Definition 2. [72] The moment of order zero of system (8) at s ⋆ ∈ C \ σ(A(s)) is the complex number η 0 (s ⋆ ) = W (s ⋆ ). The moment of order k ∈ Z >0 of system (8) at s ⋆ ∈ C \ σ(A(s)) is the complex number A theory of moments can be also developed for time-delay systems exploiting a particular Sylvester-like equation [72]. For completeness, we briefly formulate the results of [72] in a form that is more convenient for our purposes.

C. Moments of a system in explicit form
Consider a single-output, continuous-time system in explicit form 5 described by equations in which x(t) ∈ R n , y(t) ∈ R, t 0 ∈ R ≥0 , x(t 0 ) = x 0 ∈ R n , C ∈ R 1×n and Λ : R × R → R n×n is piecewise continuously differentiable in the first argument and such that Λ(t 0 , t 0 ) = I.
In analogy with the theory developed for linear systems, a characterisation of the steady-state output response of the interconnection of system (6) and of system (12), with v = y, i.e. of the system can be given in terms of the solution of a matrix differential equation which plays the role of the Sylvester equation (4). 5 The terminology is borrowed from [57 , 58].
This, in turn, allows to define a notion of moment for systems in explicit form [77]. To this end, the following technical assumptions are needed.
Assumption 5. The entries of Λ have a strictly proper Laplace transform.
Assumption 6. The pair (x 0 , C) is such that the poles of the Laurent series associated with L{CΛx 0 } and L{Λ} coincide.
is an invariant integral manifold 6 of system (13) whenever the function Υ : R → R (k+1)×n satisfies (except at points of discontinuity) the ordinary differential equatioṅ the solution of which is with initial condition Υ(t 0 ) ∈ R (k+1)×n , for all t ≥ t 0 . △ The following result is instrumental to define the notion of moment for system (12) and is adapted to the purposes of this paper from a statement originally presented in [77].
Proof. To begin with note that by Assumption 1 the righthand side of (15) is well-defined. Let Υ 1 and Υ 2 be the solutions of (15) (except at point of discontinuity) corresponding to the initial conditions Υ 1 (t 0 ) ∈ R (k+1)×n and Υ 2 (t 0 ) ∈ R (k+1)×n . Defining the function E as the difference of Υ 1 and Υ 2 and differentiating with respect to the time argument yields the ordinary differential equatioṅ . By Assumptions 3 and 4, this implies lim t→∞ E(t) = 0. As a result, there exists a solution Υ ∞ to which every solution of (15) converges asymptotically, i.e. there exists Υ ∞ (t 0 ) ∈ R (k+1)×n such that lim t→∞ Υ(t) − Υ ∞ (t) = 0 for any Υ(t 0 ) ∈ R (k+1)×n , where Υ and Υ ∞ are the solutions of (15) with initial conditions Υ(t 0 ) and Υ ∞ (t 0 ), respectively. Moreover, by Assumption 2, Υ ∞ is unique. To complete the proof, note that by Remark 1 the set (14) is an invariant integral manifold of system (13) and that the set (14) is attractive since every solution of (15) converges asymptotically to Υ ∞ .
Theorem 2 allows to introduce the following definition, which is reminiscent of the notion of moment given in [58].
Proof. By Assumptions 1, 2, 3 and 4, in view of Theorem 2, the steady-state response of the output d is well-defined. By Assumption 5 and 6, the steady-state response of the output d coincides with Υ ∞ x, which by definition is the moment of system (12) at s ⋆ .
Remark 2. Equations (12) describe a considerably general class of continuous-time signals. In particular, under the stated assumptions, this class contains all (exponentially bounded) piecewise continuously differentiable functions that can be generated as the solution of a single-input, single-output, continuous-time, linear, time-varying system of the forṁ

D. Moments of a random variable
For completeness, we now recall the notion of moment of a random variable.
To simplify the exposition, in the sequel we consider exclusively continuous random variables admitting a probability density function with finite moments of all orders. We also ignore all measure-theoretic considerations as they are not essential for any of the arguments. A discussion on the extension of our results to more general situations is deferred to Section V.
To illustrate our results and to demonstrate our approach we use several worked-out examples throughout this work.
Example 1 (The exponential distribution). The probability density function of a random variable X having exponential distribution with parameter λ ∈ R >0 is defined as A direct computation shows that the moment of order k ∈ Z >0 of the random variable X satisfies the relation µ k = k λ µ k−1 , with µ 0 = 1, and hence µ k = k! λ k . Example 2 (The hyper-exponential distribution). The probability density function of a random variable X having hyperexponential distribution is defined as where n ∈ Z >0 is referred to as the number of phases of X and λ = (λ 1 , λ 2 , . . . , λ n ) ∈ R n >0 and p = (p 1 , p 2 , . . . , p n ) ∈ ∆ n−1 are given parameters. Observe that f X can be written as is the probability density function of a random variable X j having exponential distribution with parameter λ j . Thus, by linearity of the expectation operator, the moment of order k ∈ Z ≥0 of the random variable

III. MAIN RESULTS
This section contains the main results of the paper. The moments of a random variable are shown to be in one-to-one correspondence with the moments of a system at zero. This is established first for the special situation in which a given probability density functions can be identified with the impulse response of a linear system. The theory developed is then extended to the broader class of probability density functions which can be represented by a system in explicit form using the theory of moments presented in the previous section. Finally, the case of probability density functions defined on the whole real axis and on compact sets is considered.
A. Probability density functions realized by linear systems Definition 5. Consider system (1) and a random variable X with probability density function f X . The probability density function f X is realized by system (1) Necessary and sufficient conditions for a probability density function to be realized by a linear system can be established using well-known results of linear realization theory [79 -81]. Note that every linear realization of a probability function must be stable 7 , as detailed by the following statement.
Lemma 5. Consider system (1) and a random variable X with probability density function f X . If system (1) is a realization of f X , then σ(A) ⊂ C <0 .
As a direct application of Definitions 1, 4, 5 and of Lemma 5 the moments of a random variable can be characterised by means of moments of systems.
Theorem 4. Consider system (1) and a random variable X with probability density function f X . Assume system (1) is a realization of f X . Then the moments of the random variable X and the moments of system (1) at zero are such that for all k ∈ Z ≥0 . Corollary 1. Consider system (1) and a random variable X with probability density function f X . Assume system (1) is a realization of f X . Then the moments up to the order k ∈ Z ≥0 of the random variable X are given by the entries of Ψ k Υ k B, where Υ k is the unique solution of the Sylvester equation (3), Remark 3. The moments of a random variable can be determined by direct application of Definition 4 or by "pattern matching" using existing tables of moments. The one-to-one correspondence established in Corollary 1, on the other hand, indicates that a closed-form expression for the moments of a random variable can be computed from the solution of a Sylvester equation, which can be solved with numerically reliable techniques [25]. The computation of moments of random variables through Sylvester-like equations is one of the leitmotifs underlying our approach. △ Corollary 2. Consider system (1) and a random variable X with probability density function f X . Assume system (1) is a realization of f X . Then the moments up to the order k ∈ Z ≥0 of the random variable X are in one-to-one correspondence with the matrix Υ k B, where Υ k is the unique solution of the Sylvester equation (4), in which S is a non-derogatory matrix with characteristic polynomial (5) and M is such that the pair (S, M ) is reachable.
Corollary 3. Consider system (1), system (6), and the interconnected system (7). Let X be a random variable with probability density function f X . Assume system (1) is a realization of f X , s ⋆ = 0, x(0) = 0, ω(0) = 0 and u = δ 0 . Then there exists a one-to-one correspondence between the moments up to the order k ∈ Z ≥0 of the random variable X and the (well-defined) steady-state response of the output d of system (7).
Example 3 (The exponential distribution, continued). Consider a random variable X having exponential distribution with probability density function f X and parameter λ ∈ R >0 . A direct inspection shows that the probability density function f X is realized by the linear, time-invariant systeṁ i.e. by system (1) with A = −λ, B = λ and C = 1. Note that the only eigenvalue of system (21) has negative real part, which is consistent with Lemma 5. A direct application of Definition 1 yields η k (0) = 1 λ k , which, in view of Example 1 and in agreement with Theorem 4, shows that the moments of the random variable X are in one-to-one correspondence with the moments of system (21) at zero. In accordance with Corollary 1, ′ , in which Υ k is the unique solution of the Sylvester equation (3). By Corollary 2, a one-to-one correspondence can be also inferred between the moments of the random variable X and the Sylvester equation (4). Finally, note that the components of the (welldefined) steady-state response of the output d of system (7) can be written as d l (t) = j! , for all l ∈ {1, 2, . . . , k + 1}, and hence there is a one-to-one correspondence between the moments up to the order k of the random variable X and the steady-state response of the output d of system (7).

B. Probability density functions realized by systems in explicit form
We have seen that a systems-theoretic interpretation can be given to probability density function which can be realized by a linear system. However, the vast majority of probability density functions cannot be described by a linear time-invariant differential equation. To provide a generalisation of the results established which accounts for more general probability density functions, we develop a parallel of the formulation presented in the previous section using the theory of moments for systems in explicit form [58]. To begin with, we introduce the following definition.
Definition 6. Consider system (12) and a random variable X with probability density function f X . The probability density function f X is realized 8 (12) is referred to as a (explicit) realization of f X .
Theorem 5. Consider system (6), system (12) and the interconnected system (13). Suppose Assumptions 1, 2, 3 and 4 hold. Let X be a random variable with probability density function f X and assume system (12) is a realization of f X . Then the moments of the random variable X up to the order k ∈ Z ≥0 are in one-to-one correspondence with the moment of system (12) at zero.
Proof. To begin with, note that by Assumptions 1, 2, 3 and 4 the moment of system (12) at zero is well-defined. By Definition 3 and by (16), the moment of system (12) at zero reads as where the last identity holds since system (12) is a realization of f X . Define H + (t) = t t0 e −Sζ M f X (ζ)dζ. Since S is a nonderogatory matrix with characteristic polynomial (5), with s ⋆ = 0, and since the pair (S, M ) is reachable, there exists a nonsingular matrix T ∈ R (k+1)×(k+1) such that T −1 M = e k+1 and T −1 ST = J 0 . This implies 8 The impulse response may depend on the time of the impulse t 0 for time varying systems. Note, however, that our purpose is to model the probabilistic structure of a random variable representing its probability density function by means of a system and its impulse response. This means that t 0 can be considered as a parameter that can be assigned. and hence that the components of the moment of system (12) at zero grow polynomially as t → ∞, with coefficients uniquely determined by the moments µ 0 , µ 1 , . . . , µ k , which proves the claim.
Remark 4. Assumption 1 is violated by any explicit realization of the form (12) associated with a probability density function with compact support, i.e. zero everywhere except on a compact subset of the real line. A discussion on the extension of Theorem 5 to such class of probability density functions is deferred to Section III-D. △ Remark 5. While every linear realization of a probability density function must be internally stable (Lemma 5), it is not possible to prove that every explicit realization of a probability density function must satisfy Assumption 3. The reason is that there exist probability density functions with a "tail" that is "heavier" than the one of the exponential [69 , 82], including those of Pareto, Weibull, and Cauchy random variables. Assumption 3 is therefore a strong assumption which rules out important probability density functions. Note, however, that a generalisation of our results to probability density functions with a heavy tail can be established with more advanced measure-theoretic tools. △ Example 4 (The half-normal distribution). The probability density function of a random variable X having half-normal distribution with parameter σ ∈ R >0 is defined as A direct inspection shows that the probability density function f X is realized by the linear, time-varying systeṁ in which x(t) ∈ R, u(t) ∈ R, y(t) ∈ R. Consider now the interconnection of system (6) and of system (24), with v = y, set s ⋆ = 0 and note that Assumptions 1, 2, 3 and 4 hold. Equation (15) boils down tȯ and can be solved by direct application of formula (16), with Since x(t) = In accordance with Theorem 4, this shows that the moments of the random variable X up to the order k ∈ Z ≥0 uniquely specify the moment of system (24) at zero as t → ∞. Corollary 4. Consider system (6), system (12) and the interconnected system (13). Suppose system (12) is a realization of f X and Assumptions 1, 2, 3, 4, 5 and 6 hold. Then there exists a one-to-one correspondence between the moments up to the order k ∈ Z ≥0 of the random variable X and the (well-defined) steady-state response of the output d of system (13).

C. Probability density functions on the whole real axis
The results established so far characterise probability density functions with support on the non-negative real axis. These results are not satisfactory because most probability density functions are defined over the whole real line. This issue, however, can be easily resolved using the following approach.
Every probability density function f X can be decomposed as the sum of a function f c which vanishes on the non-positive real axis and of a function f ac which vanishes on the nonnegative real axis, for all t ∈ R. We call f c the causal part of f X and f ac the anticausal part of f X . Note that the function need not to be continuous, but only integrable.
With these premises, the following result holds.
Theorem 6. Consider a random variable X with probability density function f X . Let f c and f ac be the causal part and the anti-causal part of f X , respectively. Assume f c is realized by the minimal systeṁ and the time reversal of f ac is realized by the minimal systeṁ x ac = A ac x ac + B ac u ac , y ac = C ac x ac , with x j (t) ∈ R nj , u j (t) ∈ R, y j (t) ∈ R, and A j ∈ R nj ×nj , B j ∈ R nj ×1 and C j ∈ R 1×nj constant matrices for j ∈ {c, ac}. Then the moments of the random variable X and the moments of systems (27) and (28) at zero satisfy the identity for every k ∈ Z ≥0 .
Proof. Let k ∈ Z ≥0 and note that , which proves the claim.
Corollary 5. Consider system (1) and a random variable X with probability density function f X . Assume f X is even and the causal part of f X is realized by system (1). Then the moments of the random variable X and the moments of system (1) at zero satisfy the identity for all k ∈ Z ≥0 .
Proof. Let f c and f ac be the causal part and the anticausal part of f X , respectively. By hypothesis the iden- Example 5 (The Laplace distribution). The probability density function of a random variable X having a Laplace distribution with parameter λ ∈ R >0 is defined as The causal part of f X is f c : R → R, t → λ 2 e −λt δ −1 (t), while the anticausal part of f X is f ac : R → R, t → λ 2 e λt δ −1 (t). The causal part and the time reversal of the anti-causal part of f X are both realized by the minimal systeṁ in which x(t) ∈ R, u(t) ∈ R, y(t) ∈ R. Thus, by Theorem 6, the moment of order k ∈ Z ≥0 of the random variable X is In agreement with Corollary 5, since f X is even and the causal part of f X is realized by system (32), the moment of order k ∈ Z ≥0 of the random variable X can be written as µ k = (−1) k +1 2 k! λ k , which is consistent with formula (33). Finally, we emphasise that a simple exercise in integration shows that the moments of the random variable X are indeed given by (33). Remark 6. Theorem 6 and Corollary 5 allow to establish Corollaries 1, 2 and 3 for a random variable X with probability density function defined on the whole real axis, provided that its causal part and the time reversal of its anti-causal part are realized by systems of the form (27) and (28), respectively. This can be achieved noting that the moments up to the order k ∈ Z ≥0 of the random variable X are given by the entries of with Ψ k ∈ R (k+1)×(k+1) a signature matrix and D k = diag(k!, . . . , 1!, 1), and Υ T ∈ R (2k+2)×(na+nac) is the unique solution of the Sylvester equation with △ The arguments used to prove the results above extend immediately to the case in which the probability density function a given random variable is defined on the whole real axis and its causal and anticausal parts are realized by a system in explicit form. The key point is that one has to consider a signed sum of the moments of the systems which realize the causal part and the anticausal part of the probability density function of the random variable of interest. For brevity we do not repeat other versions of these results for probability density functions realized by systems in explicit form; instead, we consider the following important example. Example 6 (The normal distribution). The probability density function of a random variable X having a normal distribution with parameter σ ∈ R >0 is defined as The The causal part and the time reversal of the anti-causal part of f X are both realized by the linear, time-varying systeṁ in which x(t) ∈ R, u(t) ∈ R, y(t) ∈ R. Consider now the interconnection of system (6) and of system (37), with v = y, set s ⋆ = 0 and note that Assumptions 1, 2, 3 and 4 hold. In analogy with Example 4, noting that equation (15) boils down to (25) gives Υ ∞ (t) = e St H + (t), with H + (t) defined as in (26). For a suitable signature matrix Ψ k , defining Υ T,∞ (t) = Υ ∞ (t) + Ψ k Υ ∞ (t) and noting that by Definitions 3 and 4, allows to conclude (22) holds, with µ 0 = 1 and Generalising the results of Theorem 6, this shows that the moments up to the order k ∈ Z ≥0 of the random variable X uniquely specify the moment of system (24) at zero as t → ∞. Note also that Υ T,∞ can be written as Υ T,∞ = (I + Ψ k ) Υ ∞ , which, in a broad sense, is in agreement with Corollary 5.

D. Probability density functions with compact support
We now concentrate on probability density functions with compact support. To begin with a limitation of the characterisation of the moments of a random variables in terms of explicit systems is illustrated through a simple example. Example 7 (The uniform distribution). Suppose we wish to find a realization of the probability density function of a random variable X having a uniform distribution with parameters a, b ∈ R >0 , with a < b, defined as Clearly, any explicit realization of the form (12) necessarily violates Assumption 1, since f X is zero everywhere except on a compact subset of the real line. As a result, the theory developed in Section III-B does not apply. However, the probability density function (38) can be also interpreted as the impulse response of the linear, time-delay system with discrete constant delayṡ i.e. by system (8) with x(t) ∈ R, u(t) ∈ R, y(t) ∈ R, τ 1 = a, , C(s) = 1. Note that the moments of system (39) are not classically defined at zero, since 0 ∈ σ(A(s)). However, since zero is a removable singularity 9 of the transfer function of system (39), the moments of system (39) can be defined and characterised by means of Sylvester equations and impulse responses using the notions and results introduced in [85 -87]. In particular, the moments of system (39) at zero satisfy the identity for every k ∈ Z ≥0 , with Ψ k ∈ R (k+1)×(k+1) a signature matrix and Υ k ∈ R k+1 a solution of the (Sylvester) equation To see this, note that Exploiting the definition of matrix exponential, the identity (41) and the property allows to conclude which, in view of (42), proves the identity (40). We emphasise that, in line with the results developed for probability density functions realized by linear systems, the relation (20) between the moments of f X and the moments of the corresponding realization is satisfied: a one-to-one correspondence exists between the moments of the random variable X and the moments of system (39), . The main reason why it is possible to characterise in systemstheoretic terms the moments of a random variable having a uniform distribution is that zero is a removable singularity of the transfer function of the associated time-delay system. This observation allows to generalise the argument used in Example 7 to treat random variables the probability density function of which has compact support and is polynomial on the complement of its zero set. To see this, consider a random variable X having a probability density function of the form 9 See, e.g., [84].
in which a, b ∈ R >0 , with a < b, and q(t) = (43) can be written as L{f X } = Q1(s)e −sa −Q2(s)e −sb s ν and has a removable singularity at zero if and only if l k=0 (−1) k q k 1 a l−k − q k 2 b l−k = 0 for all l ∈ {0, 1, . . . , ν − 1}. Under these conditions, the probability density function (43) can be realised by system (8) setting, e.g., τ 1 = a, τ 2 = b, , and the moments of the random variable X can be shown to be in one-to-one correspondence with the moments at zero of such system. To illustrate this point, we consider a simple example.
Example 8 (The triangular distribution). The probability density function of a random variable X having a triangular distribution with parameter τ ∈ R >0 is defined as The causal part of f X is f c : The causal part and the time reversal of the anti-causal part of f X are both realized by the time-delay systeṁ i.e. by system (8) with x(t) ∈ R 4 , u(t) ∈ R, y(t) ∈ R, τ 1 = 0, The moment at zero of order k ∈ Z ≥0 of the system is which is consistent with the identity (20), since the moment of order k ∈ Z ≥0 of the random variable X reads as This also implies that a one-to-one correspondence exists between the moments of the random variable X and the moments of the system. We emphasise that exploiting the argument of Example 7 the moments of the system (and thus those of the random variable X) may be computed using the formula (11).

IV. APPLICATIONS
This section contains a series of applications of the proposed ideas. We first focus on the identifiability of probability density functions admitting a linear realization. Then, a systemstheoretic interpretation for sums of independent random variables, the notion of mixture distribution and basic results from renewal theory are provided. Finally, connections between the approximation of probability density functions and the model reduction problem are studied.

A. Identifiability of probability density functions with linear realizations
We begin this section considering the case in which the probability density function of a given random variable is parameterized by a fixed set of parameters. In other words, while in the previous sections the parameters of probability density functions have been assumed to be known, in this section parameters are constant unknown quantities, which in principle can (or must) be estimated. In particular, we study the identifiability of parametric families of probability density functions the elements of which admit a linear realization. This is important, for example, in the context of parametric estimation, where identifiability allows to avoid redundant parametrisations and to achieve consistency of estimates [50 , 51 , 88].
Let Θ be an open subset of R d representing the parameter set and let F X be a family of probability density functions defined on the real axis and associated with a random variable X. Every element f X ∈ F X is a probability density function t → f X (t; θ) which is known once the element θ ∈ Θ has been fixed.
The notion of observational equivalence induces an equivalence relation on the parameter set, defined as θ 1 ∼ θ 2 if θ 1 ∈ Θ and θ 2 ∈ Θ are observationally equivalent. The parameter set is therefore partitioned into equivalence classes the cardinality of which determines the identifiability of the family of probability density functions considered, as specified by the following definition.
A characterisation of the identifiability of a family of probability density functions admitting a linear realization can be given by means of the systems-theoretic notion of minimality. To this end, note that the description of probability density functions as impulse responses has an inherent non-uniqueness issue, since algebraically equivalent 11 linear 10 A property is satisfied for almost all t ∈ R if the set where the property does not hold has Lebesgue measure equal to zero. 11 The single-input, single-output, continuous-time, linear, time-invariant systems have the same impulse response. However, a oneto-one correspondence between impulse responses and their realizations can be established resorting to canonical forms [80], such as the observer canonical form, defined by constant matrices A ∈ R n×n , B ∈ R n and C ∈ R 1×n of the form with α = (α 1 , . . . , α n ) ∈ R n and β = (β 1 , . . . , β n ) ∈ R n . With these premises, we may recover the following wellknown result [51].
Lemma 6. Let Θ be an open subset of R d , representing the parameter set, and let F X be a family of probability density functions defined on the real axis and associated with a random variable X. Assume every f X ∈ F X is realized by system (1) with A ∈ R n×n , B ∈ R n and C ∈ R 1×n as in (48) and let θ = (α, β) ∈ Θ . Then the family of probability density functions F X is identifiable if and only if every pair (A, B) is reachable.
Proof. Note that the family of probability density functions F X is identifiable if and only if for every f X ∈ F X the map (α, β) → F X (s; α, β) = β n s n−1 + . . . + β 2 s + β 1 s n + α n s n−1 + . . . + α 2 s + α 1 with F X (s; α, β) = L{f X }, is injective. This, in turn, corresponds to the numerator and denominator of the rational function F X (s; α, β) being coprime. As a result, the identifiability of the family F X is equivalent to the minimality of system (1) and, by observability of the pair (A, C), to the reachability of the pair (A, B), which proves the claim.
Remark 7. A dual result can be proved using the controllability canonical form as long as observability, and hence minimality, is enforced. This suggests that the identifiability of a family of probability density functions admitting a linear realization is equivalent to the minimality of a given canonical realization, which can be thus taken as the definition of identifiability. △

B. Sums of independent random variables
A classical theorem of probability theory states that the probability density function of the sum of two jointly continuous, independent random variables is given by the convolution of their probability density functions (see, e.g., [73]). This result can be given a simple systems-theoretic interpretation.
Theorem 7. Let X 1 and X 2 be jointly continuous, independent random variables with probability density functions f X1 and f X2 realized by the minimal systeṁ and the minimal systeṁ with x j (t) ∈ R nj , u j (t) ∈ R, y j (t) ∈ R, and A j ∈ R nj ×nj , B j ∈ R nj ×1 and C j ∈ R 1×nj constant matrices for j ∈ {1, 2}, respectively. Then the probability density functions of the random variable Y = X 1 + X 2 is realized by the interconnection of system (49) and system (50) with u 1 = y 2 .
Proof. Recall that the probability density function of the sum of two jointly continuous, independent random variables is given by the convolution of their probability density functions [73, Taking the Laplace transform on both sides, this implies L{f Y } = L{f X1 }L{f X1 }. Thus, the probability density function f Y is realized by the interconnection of systems (49) and (50) with u 1 = y 2 , since the transfer function associated with the probability density function f Y is the product of the transfer functions associated with the probability density functions f X1 and f X2 .
The following result is an immediate extension of Theorem 7.
Corollary 6. Let X 1 , X 2 , . . . , X N be jointly continuous, independent random variables. Assume the probability density function f Xj is realized by the minimal systeṁ with x j (t) ∈ R nj , u j (t) ∈ R, y j (t) ∈ R, and A j ∈ R nj ×nj , B j ∈ R nj ×1 and C j ∈ R 1×nj constant matrices for j ∈ {1, 2, . . . , N }, respectively. Then the probability density functions of the random variable Y = X 1 + X 2 + . . . + X N is realized by the interconnection of the family of systems (51) with u 1 = y 2 , u 2 = y 3 , . . . , u N −1 = y N .
Example 9 (The Erlang distribution). Suppose we wish to show that the probability density function f Y of the random variable Y = X 1 + X 2 + . . . + X N , in which N ∈ Z >0 and X 1 , X 2 , . . . , X N are jointly continuous, independent random variables having exponential distribution with parameter λ ∈ R >0 , is that of an Erlang distribution with parameters λ and N , defined as Recall that a minimal realization of the probability density function f Xj of the random variable X j is described by system (21) for all j ∈ {1, 2, . . . , N }. Thus, by Corollary 6, a realization of the probability density function f Y is given by system (1), in which We conclude this example noting that a direct computation shows that the probability density function f Y is indeed given by (52).
, with N ≥ 2, and jointly continuous, independent random variables X 1 , X 2 , . . . , X N such that Y = X 1 + X 2 + . . . + X N [89]. In case the random variables X 1 , X 2 , . . . , X N are also identically distributed, then Y is said to be divisible [89]. The notions of decomposability and of divisibility play an important role in probability theory [90], particularly in the analysis of Lévy processes [7 , 67].
In light of Theorem 7 of Corollary 6, these notions can be characterised in systems-theoretic terms. In particular, decomposability (and divisibility) of a random variable are related to the possibility of describing the corresponding system as the series interconnection of finitely many (and possibly identical) systems. △ The following result provides a systems-theoretic necessary and sufficient condition which ensures the identifiability of a family of probability density function the elements of which can be represented as the sum of random variables with probability density functions admitting a linear realization.
Theorem 8. Let F X be a family of probability density functions the elements of which can be realized as the sum of the probability density functions f X1 and f X2 . Assume f X1 and f X2 are realized by the minimal systems (49) and (50), respectively. Then the family of probability density functions F X is identifiable if and only if (i) the polynomials C 1 adj(sI − A 1 )B 1 and det(sI − A 2 ) have no common roots and (ii) the polynomials C 2 adj(sI − A 2 )B 2 and det(sI − A 1 ) have no common roots.
Proof. The identifiability of the family of probability density functions F X is equivalent to the minimality of the series interconnection of the minimal systems (49) and (50), which, in turn, is equivalent to conditions (i) and (ii) [80 , 91].

C. Mixture distributions
We have seen that the sum of two jointly continuous independent random variables has a natural interpretation in terms of the series interconnection of the realizations of their probability density functions. To provide a probabilistic counterpart of the notion of parallel interconnection we recall the following definition, which is adapted from [92].
Definition 9. A random variable Z with probability density function f Z is said to arise from a finite mixture distribution if there exist N ∈ Z >0 and jointly continuous random variables X 1 , X 2 , . . . , X N with probability density functions f X1 , f X2 , . . . , f X N such that the probability density function f Z satisfies f Z = w 1 f X1 + w 2 f X2 + · · · + w N f X N , for some w = (w 1 , w 2 , . . . , w n ) ∈ ∆ n−1 . N is referred to as the number of components of f Z , f X1 , f X2 , . . . , f X N are referred to as the components of f Z , and w 1 , w 2 , . . . , w N are referred to as the weights of f Z . Theorem 9. Under the hypotheses of Theorem 7, if the random variable Z with probability density function f Z arises from a finite mixture distribution with components f X1 and f X2 and weights w 1 = 0 and w 2 = 0, then the probability density function f Z is realized by the interconnection of system (49) and system (50) with u = u 1 = u 2 and y = w 1 y 1 + w 2 y 2 .
Proof. By hypothesis, the probability density function f Z arises from a finite mixture distribution with components f X1 and f X2 and weights w 1 and w 2 , i.e. f Z = w 1 f X1 + w 2 f X2 . Taking the Laplace transform on both sides, by linearity, yields L{f Z } = w 1 L{f X1 }+w 2 L{f X2 }. Thus, the probability density function f Z is realized by the interconnection of system (49) and system (50) with u = u 1 = u 2 and y = w 1 y 1 + w 2 y 2 , as desired.
Example 10 (G/H 2 /1 queueing system). Consider the G/H 2 /1 queueing system in Fig. 1, in which the arrival process is a general random process and the service process is governed by a two-phase hyper-exponential random variable X. A customer accesses either the service offered by the first server at rate λ 1 ∈ R >0 with probability p 1 ∈ (0, 1) or the service offered by the second server at rate λ 2 ∈ R >0 with probability p 2 = 1 − p 1 . The probability density function of the random variable X which represents the service time, i.e. the time spent by an arbitrary customer in the service process, is f X (t) = p 1 λ 1 e −λ1t δ −1 (t) + p 2 λ 2 e −λ2t δ −1 (t) for all t ∈ R ≥0 . In view of Theorem 9, the probability density function of the service time is realized by system (1), with A straightforward generalisation of Theorem 9 is given by the following result.
Corollary 7. Under the hypotheses of Corollary 6, let the random variable Z with probability density function f Z arise from a finite mixture distribution with components f X1 , f X2 , . . . , f X N and weights w 1 , w 2 , . . . , w N = 0. Then the probability density function f Z is realized by the interconnection of systems (51) with u = u 1 = u 2 = . . . = u N and y = w 1 y 1 + w 2 y 2 + . . . + w N y N .
We conclude this section presenting a systems-theoretic necessary and sufficient condition which guarantees the identifiability of finite mixtures admitting a linear realization (see also [92]).
Theorem 10. Let F X be a family of probability density functions, the elements of which arise from a finite mixture distribution with components f X1 and f X2 and weights w 1 ∈ R >0 and w 2 ∈ R >0 . Assume f X1 and f X2 are realized by the minimal systems (49) and (50), respectively. Then the family of probability density functions F X is identifiable if and only if σ(A 1 ) ∪ σ(A 2 ) = ∅.
Proof. The family of probability density functions F X is equivalent to the minimality of the parallel interconnection with weights w 1 ∈ R >0 and w 2 ∈ R >0 of the minimal systems (49) and (50), which, in turn, is equivalent to σ(A 1 ) ∪ σ(A 2 ) = ∅ [80 , 91].
Example 11 (G/H 2 /1 queueing system, continued). Suppose we are interested in finding conditions under which the family of probability density functions which describes the service time of the G/H 2 /1 queueing system displayed in Fig. 1, is identifiable. Note that the probability density function of the service time is realized by system (1), with matrices defined as in (54). By Theorem 10, since p 1 , p 2 ∈ (0, 1), the family of probability density functions F X is identifiable if and only if λ ∈ R 2 does not lie on the bisector of the first quadrant, i.e. λ 1 = λ 2 . Note that in the case λ 1 = λ 2 = λ ∈ R >0 a customer accesses the service offered by a server at rate λ with probability one and hence the model is overparameterised. In other words, the queueing system in question may be equivalently described by the G/M/1 queueing system displayed in Fig. 2, in which the service process is governed by an exponential random variable X with parameter λ. As anticipated in Remark 7, this phenomenon is also captured by the systems-theoretic notion of minimality: in accordance with Theorem 10 for λ 1 = λ 2 the system (54) is not minimal.

D. Renewal processes
We complete this section showing that elementary results from renewal theory [6 , 7 , 69] can be translated in the language of systems theory using the notion of feedback interconnection. Definition 10.
[7] A sequence of random variables {S j } j∈Z ≥0 constitutes a renewal process if it is of the form 12 S j = T 1 + T 2 + . . . + T j , where {T j } j∈Z>0 is a sequence of mutually independent random variables with a common distribution F such that F (0) = 0.
The random variable S j in the above definition is often referred to as (the j-th) renewal, while the elements of the sequence {T j } j∈Z>0 are referred to as waiting times [7 , 69]. The common distribution of the waiting times of a renewal process is called the waiting-time distribution [69].
The probabilistic behaviour of a renewal process {S j } j∈Z ≥0 is closely related to the random variable N t , with t ∈ R >0 , defined as the largest j ∈ Z ≥0 for which S j ≤ t [93]. The random variable N t describes the number of renewals occurred by time t and its expected value H(t) = E[N t ], referred to as the renewal function, satisfies the integral equation of renewal theory [93]. Moreover, if the waiting-time distribution of the renewal process is absolutely continuous then the renewal density, defined as h(t) =Ḣ(t) for all t ∈ R >0 , satisfies the renewal density integral equation [93], i.e.
in which f is the derivative of the waiting-time distribution.
Theorem 11. Let {S j } j∈Z ≥0 be a renewal process. Assume the waiting-time distribution of the renewal process {S j } j∈Z ≥0 admits a probability density function f which is realized by the minimal system (1). Then the renewal density h of the renewal process {S j } j∈Z ≥0 is realized by the system obtained from system (1) with input v(t) ∈ R, output y(t) ∈ R, and interconnection equation u = v + y.
Proof. To begin with note that under the stated assumptions the renewal density h of the renewal process {S j } j∈Z ≥0 is well-defined [69,Proposition 2.7]. In addition, the renewal density h satisfies the renewal density integral equation (55). This implies that the Laplace transform of the renewal density  (1), the renewal density h is realized by the system obtained from system (1) with input v(t) ∈ R, output y(t) ∈ R, and interconnection equation u = v + y.
Example 12 (Poisson processes). Poisson processes are renewal processes in which the waiting times have an exponential distribution [70]. In other words, a renewal process is a Poisson process if there exists λ ∈ R >0 such that the probability density function of each waiting time is f (t) = λe −λt δ −1 (t), for all t ∈ R ≥0 . In view of Theorem 11, the renewal density of the process is realized by the system obtained from system (21) with input v(t) ∈ R, output y(t) ∈ R, and interconnection equation u = v + y, i.e. by system (1), with A = 0, B = λ, C = 1. Note that the impulse response of the system is given by h(t) = λδ −1 (t) for all t ∈ R, which is consistent with the fact that the renewal function of a Poisson process can be written as H(t) = λt for all t ∈ R ≥0 [94].

E. On the approximation of probability density functions
This section investigates some connections between the approximation of probability density functions and the model reduction problem [25]. In particular we show that these problems are essentially the same problem when probability density functions are regarded as impulse responses. As a guiding example, we consider phase-type distributions [30 , 31 , 33 , 34], which play an important role in the analysis of queuing networks [94] and can be represented by a random variable describing the time until absorption of a Markov chain with one absorbing state [95]. Note, however, that similar considerations can be performed for more general classes of probability density functions.
Consider a continuous-time Markov chain over the set S = {0, 1, . . . , n}, with n ∈ Z >0 , in which 0 is an absorbing state and 1, . . . , n are transient states. The random variable X which characterises the time until absorption is described by a continuous phase-type distribution represented by the Q-matrix [37] in which S ∈ R n×n is such that Q 0 = −Q1. Assuming that the initial probability of the Markov chain is with α ∈ R 1×n such that α1 = 1, the probability density function of X reads as for every t ∈ R. Note that (56) can be regarded as the impulse response of system (1), with A = Q ′ , B = α ′ and C = Q ′ 0 . This indicates that the problem of approximating the probability density function (56) can be regarded as the problem of approximating the impulse response of system (1) or, equivalently, the problem of constructing a reduced order model of system (1) [25]. In particular, approximating the probability density function (56) by another phase-type distribution boils down to constructing a systeṁ in which ξ(t) ∈ R ν , with ν < n, v(t) ∈ R, ψ(t) ∈ R and F ∈ R ν×ν , G ∈ R ν×1 and H ∈ R 1×ν are constant matrices such that 1 ′ G = 1 and H = −1 ′ F . To illustrate this point we consider a simple example and exploit the model reduction technique devised in [61 , 62] to obtain reduced order models (i.e. approximations of a given probability density function) which match a prescribed number of (systems-theoretic and, thus, probabilistic) moments. Note, however, that different model reduction techniques can be used to solve the problem in question (see [25] and references therein for an overview of available model reduction techniques). Example 13. Consider the Markov chain described by the diagram Let α = [ 0 0 1 ] be the the initial condition of the Markov chain and let X be the random variable which characterises the time until absorption. A realization of the probability density function of the random variable X is given by system (1), with , B = 0 1 , C = λ µ . Exponential Half-normal 2 πσ 2 e − t 2 Laplace λ 2 e −λ|t| λ ∈ R >0 (32) Erlang Following [62], to construct a reduced order model which matches the moment of order one of the system at zero one needs to solve the Sylvester equation (4) with ∆ ∈ (0, 1) a free parameter that can be assigned, yielding Note that the structure of the original system is preserved in the reduced order model, since 1 ′ G = 1 and H = −1 ′ F . Moreover, in agreement with the results of [62], the moment at zero of the reduced order model coincides with the moment at zero of the original system, regardless of the value of ∆. From a probabilistic point of view, the impulse response of the reduced order model corresponds to the probability density function of the random variable X which quantifies the time until absorption of the "reduced" Markov chain described by the diagram 1 0 ∆ and the Q-matrix It is interesting to note that the "reduced" Markov chain can be interpreted as Markov chain built from the original Markov chain by aggregation of the states 2 and 3, thus showing the connection between the model reduction problem and the use of the concept of aggregation to reduce the state space of a Markov chain [96 -98]. We conclude this example by emphasising that there is no natural choice of the parameter ∆. For example, one may select ∆ = λ to ensure that in the "reduced" Markov chain the transition rate from 1 to 0 matches that of the original Markov chain. Another sensible choice is ∆ = λ+µ 2 which guarantees that the moments of order two at zero of the reduced order model and of the original system coincide, so that the moments of order two of the random variables X and X also coincide.

V. CONCLUSION
Moments of continuous random variables admitting a probability density function have been studied. Under certain hypotheses, the moments of a random variable have been shown to be in one-to-one correspondence with the solutions of a Sylvester equation and with the steady-state output response of a specific interconnected system. The results established in this work have shown that, under certain assumptions, a system can be seen as an alternative, equivalent description of the probabilistic structure of a random variable. This, in turn, indicates that systems-theoretic techniques and tools can be used to revisit and shed new light on classical results of probability theory and statistics. Table I displays a short list of random variables along with their probability density functions, parameters and associated realization, while Table II summarises the correspondence between certain concepts of probability theory and their systems-theoretic counterpart.
The present work is a first step towards a unified understanding of the role of systems-theoretic concepts in probability theory and statistics. Several directions for interdisciplinary research are left open. For example, discrete-time systems [99] provide the right tool to carry out the analysis presented in this work for discrete random variables. Hybrid systems [100 , 101] may be used to deal with random variables the distribution function of which has both a discrete part and an absolutely continuous part. Moments of multivariate distributions may be studied resorting to systems described by PDEs [102 , 103], for continuous random variables, and nD systems [104], for discrete random variables. As a consequence, conditional probabilities may be characterised in systemstheoretic terms. Note, however, that modelling moments of multivariate distributions using PDEs might raise challenging computational issues. Further connections may be explored with positive systems [38] and, in particular, with Markov chains [5 , 6 , 8 , 35 -37]. The interplay between the role of Hankel matrices in realization theory [79 -81] and in probability theory [5 -8] may be studied. The notions of information and of entropy may be investigated using the proposed framework. The discussion on the approximation of probability density functions which arise from phase-type distributions can be generalised to probability density functions with a systemstheoretic representation in explicit form, for which model reduction techniques are available [58 , 77]. Finally, a systemstheoretic counterpart of the method of moments [18] may be developed modifying existing data-driven model reduction methods [63].