A Distributed Approach for the Detection of Covert Attacks in Interconnected Systems with Stochastic Uncertainties

The design of a distributed architecture for the detection of covert attacks in interconnected Cyber-Physical Systems is addressed in this paper, in the presence of stochastic uncertainties. By exploiting communication between neighbors, the proposed scheme allows for the detection of covert attacks that are locally stealthy. The proposed methodology adopts a decentralized filter, jointly estimating the local state and the aggregate effect of the physical interconnections, and uses the communicated estimates to obtain an attack-sensitive residual. We derive some theoretical detection properties for the proposed architecture, and present numerical simulations.


I. INTRODUCTION
Cyber-Physical Systems (CPSs) describe a class of largescale systems, where the physical components are integrated with cyber resources, such as communication, control, and monitoring infrastructures. They are an ever more common class of systems, following the increased penetration of information technology (IT) for monitoring and coordination purposes in industrial plants and infrastructure systems.
Among the systems that can be described as CPSs, many are safety critical, as their inadequate provision of service may have severe consequences. This has led to a growing interest in the literature on the subject of secure control, as demonstrated by the recent special issue [1], as well as the surveys [2], [3], and the works cited therein.
It has been been shown in [4], [10], [15] that malicious agents can covertly misappropriate control systems by This work has been partially supported by European Union's Horizon 2020 research and innovation programme under grant agreement No 739551 (KIOS CoE). This work has also been conducted thanks to the support of the EPSRC Centre for Doctoral Training in High Performance Embedded and Distributed Systems (HiPEDS, Grant Reference EP/L016796/1).
A. Barboni and A. J. Gallo is with the Department of Electrical and Electronic Engineering at the Imperial College London, UK. Email: • A discrete-time linear stochastic model for each subsystem instead of a continuous-time one. • The proposed distributed detection architecture is based on different estimation models: a minimum-variance unbiased estimator jointly estimates the local states and the aggregate effect of the neighbors' interconnection. • A detection method based on the statistical analysis of a properly designed residual signal is proposed, and its detectability properties are studied.
The distributed detection of attacks in stochastic systems is also considered in [8]. However, the authors do not focus on covert attacks, and do not build a distributed estimation architecture to achieve this, but rather perform hypothesis testing on appropriately processed output measurements.
The problem of unknown-input decoupling in the estimation of stochastic systems has drawn great attention in the past, and milestone contributions in the area include [16]. A more general problem is solved in [17], where unknown inputs also affect the measurement channel, while [18] improves on previous results by designing a two-step filter that also optimally estimates the input.
In this work, we adopt the filter presented in [18] to compute a distributed estimate of the local state, by decoupling it from the effect of the subsystem's neighbors.
The rest of the paper is structured as follows: in Section II, we formulate the considered problem. In Section III, the decoupled distributed filter is presented, and the properties of the state and unknown-input estimates are analyzed in Sections IV and V. Following this, a novel detection strategy is proposed in Section VI, where a suitable statistical test is defined, and some of its properties are provided. Finally, some numerical simulations are presented in Section VII.
Notation: For a vector v ∈ R n , v [i] denotes its i-th component. The identity matrix of dimension n will be defined as I n , and 0 n×m ∈ R n×m is used to define a matrix of all zeros; when clear from the context, the indices will be omitted. We use notation col j∈J [x j ] and row j∈J [x j ] for the column or row concatenation of vectors x j , j belonging to a set of indices J ⊂ N. The same notation is

A. Model of CPS
Consider a CPS composed of N subsystems S i , which are interconnected through both physical and communication links. We consider the topology of the graphs defined by these links to be the same, i.e. a communication link between two subsystems is present if there is also a physical link. We define the set of neighbors of S i as N i . = {j ∈ {1 . . . N }| ∂xi ∂xj = 0}. The dynamics of S i is: where x i ∈ R ni is the state,ũ i ∈ R mi is the control input, y i ∈ R pi is the output measurement; w i ∼ N (0, W i ) and v i ∼ N (0, V i ) are process and measurement i.i.d. Gaussian noise, with known variance matrices W i ≥ 0 and V i > 0. Furthermore, we assume the initial condition and C i are supposed to be known by the local diagnoser. As shown in the schematic diagram in Figure 1, each subsystem is locally equipped with a controller C i and a diagnoser D i , the latter of which exchanges information with their neighbors.

B. Attack model
From time k = k a , we consider a covert attack on subsystem S i . The attack is modeled as (see [4]): where η i is the attacker's control input. We stress that η i is unknown to D i ; furthermore, it is arbitrarily defined by the attacker to steer x i away from its desired nominal trajectory. The signals η i and γ i are injected in the control and measurement channels, respectively, as follows: where u i is the input as computed by the controller C i . By mimicking the dynamics of S i in (2), the attacker can compensate the effects of η i through γ i , thus making their action undetectable from the sole measurements observation.

Assumption 1:
The attacker has perfect knowledge of the The result of this attack is that the dynamics x a i is superimposed to that of S i . In the following result -a discretized version of Proposition 1 in [10] -the state of S i is decomposed in a healthy and an attacked component.
Proposition 1: Consider attack strategy (2), and let Assumption 1 hold. If the attacker's initial state is x a i (k a ) = 0, the output received by the diagnoser unit D i is: where y h i is the output of the subsystem as if it were not affected by the attack.
Proof: Throughout this work, for reasons of space, proofs will be omitted.
Remark 1: Proposition 1 provides a sufficient condition for stealthiness of the covert attack, and implies that, whatever the local estimate of the state, the residual error based on local measurements is not affected by the attack, as has been shown in the literature [4], [10].
In this paper, we address the following: Problem 1: Given a subsystem S i with dynamics as in (1) and an attack as in (2) from time k a , and let Assumption 1 hold. Design a diagnoser D i to detect the attack.

III. DISTRIBUTED DETECTION FILTERS
Each local diagnosis unit D i is equipped with a decentralized estimator, based on the filter proposed in [18] in the context of centralized estimation. Each diagnosis unit D i then exchanges information with D j , j ∈ N i , in order to compute the detection residual introduced in Section V. The state estimator is unbiased and guarantees minimum variance of the estimation error regardless of the presence of an unknown input [18].
We design the diagnostic unit D i such that the computed local estimates are decoupled from the neighbors of S i . To achieve this, the interconnection terms can be treated as unknown inputs, and rewritten as: where G i has full column rank,Ē i is a matrix of weights and defines a vector of unknown inputs ξ i ∈ R gi , which can be interpreted as the aggregate effect of all neighbors' physical interconnection on dynamics (1). The following further assumptions are needed for all subsystems S i : Given the structure of (1), the filter design in [18] can be exploited to obtain unbiased minimum-variance local state and disturbance estimatesx i andξ i , respectively. We obtain the following estimates of the local state and of the aggregate interconnections (note the presence ofỹ i , and the use of u i ): Note that these two matrices are related to each other as: Remark 2: Note that (5b) depends on delayed information, as the estimateξ i (k − 1|k) is only available at time k, once measurementỹ i (k) is available. This is to be expected, since the ξ i dynamically affects the state, i.e. the effects of ξ i (k − 1) can only be seen fromỹ(k).
We now repeat Theorem 12 in [18], which gives the theoretical properties of the estimates (5b) and (5a): Lemma 1 ([18, Thm.12]): Consider the joint input and state estimator in (5), where M i (k) satisfies: If M i (k) and K i (k) are designed as in [18], (5b) and (5a) are unbiased estimates of ξ i (k − 1) and x i (k), minimizing the mean square error over the class of all linear unbiased estimates based onx 0 i and y i (κ), 0 ≤ κ ≤ k. Remark 3: Assumption 2 is a sufficient condition for the existence of an estimate which is decoupled from an unknown input ξ i , both in the stochastic [16] and in the deterministic case [19]. On the other hand, the decomposition G iĒi is needed for the input estimation, as at most rank(E i ) components can be estimated. By means of the decomposition in (4), ξ i aggregates the independent components of the interconnection that influence x i .
In the following, we analyze the specific properties of both the state and unknown-input estimatesx i andξ i .

IV. LOCAL STATE ESTIMATION
Let us start by considering the system in healthy conditions, by analyzing the estimation and residual errors under healthy mode of behavior: .
where the superscript h has been added to highlight that the estimation error is considered in nominal conditions. We then analyze the estimation error under attack.
Remark 4: Note that, sinceỹ i = y h i , from Proposition 1, the estimates in (5) only use information from the state which is not affected by the attack. As such, it is unnecessary to include the superscript h when analyzingx i , as well as dealing with r i . Conversely, distinguishing between healthy and attacked information is crucial for error analysis.
Hence, the estimation error dynamics can be derived as: where the interconnection term G i ξ i (k − 1) is removed thanks to definition of M i (k) satisfying Lemma 1, as The influence of the physical interconnections of S i is therefore decoupled from the estimation error h i (k). As the state x i is not directly available, the residual error r i must be used to analyze detection properties. By exploiting the decomposition of i in healthy and attacked parts, and using the definition of the residual (6b) and the estimation error under nominal conditions as given in (7), we obtain: Proposition 2: Let an attacker carry out a covert attack as defined in (2) for time k ≥ k a , with x a i (k a ) = 0, and let Assumption 1 hold. The residual r i (k) is not affected by the covert attack and hence cannot be used to detect it.
Let the estimation error be defined as i . = x i −x i . Although a covert attack (2) on S i does not influence the local residual r i , the same cannot be said about the estimation error. This will be exploited further in Sections V and VI to define a residual and a suitable statistical test that enables the detection of covert attacks.

A. State estimation error statistics
We analyze the mean and variance of the residual terms, in order to define a suitable detection strategy. We initializex i (0) =x 0 i , ∀i ∈ N , and we note that i (k) = h i (k), ∀k ≤ k a holds in healthy conditions. Given the estimates' unbiasedness property defined in Lemma 1, the mean of the estimation error before the attack occurs is: while E h i (k) = 0, for all k ≥ 0. Similarly, the expected value of the residual is E [r i (k)] = 0, ∀k ≥ 0.
We derive the variance matrix Π i (k) . = Var( i (k)) for the estimation error, initializing it as Π i (0) = Π 0 i : where the covariance terms Cov( i (k − 1), w i (k − 1)) and For k > k a , i.e. after the occurrence of the attack, the estimation error is i = x a i + x h i −x i = x a i + h i , and as such its mean is given by: , ∀k > k a . (10) As the attack strategy in (2) is considered to be deterministic, it will not affect the variance Π i (k). Furthermore, although the estimation error mean is affected by the attack, the expected value of r i does not change, in line with Proposition 2.

V. ESTIMATION OF COUPLING EFFECTS
As covert attacks cannot be detected using only local estimates, we exploit the communication between D i and its neighbors to detect them in S j , j ∈ N i . Specifically, we analyze the error between the unknown input estimate (5b) and that computed from the received estimatesx j (k) computed by D jξi (k − 1|k). The corresponding error is: which holds by virtue of Lemma 1. This estimation error therefore depends only on local noise and uncertainties, as i (k) is decoupled from the neighboring subsystems. Given Lemma 1, the estimatex i (k) is unbiased by construction. Thus it is easy to see that As far as the variance is concerned, from the definitions of the variance matrix (9), it follows that it is possible to evaluate Var(ρ i (k − 1|k)) as: As ρ i (k − 1|k) is unavailable to D i , it cannot be used to detect an attack in S j , j ∈ N i . Instead, supposing that D i receives the estimatesx j (k) from the neighbors' diagnosis units D j , ∀j ∈ N i , it is possible to locally definê .
that can be regarded as a distributed estimate of the unknowninput estimation error, which may be used for detection. From (14) and (11), we obtain: Proposition 3: Let Lemma 1 hold. When there are no attacked subsystems S j , j ∈ N i , the residualρ i (k − 1|k) follows a Gaussian distribution with mean and variance µ i (k) and Σ i (k), respectively, where Proof: Let us examine the expected value of ρ i (k − 1|k). Let Subsystem S l , l ∈ N i be under attack starting from k = k a > 0; then, it follows from (10), (12), and (15) that the mean µ i (k) . 1|k)] is given by: where ζ a i,l (k − 1) .
x a l (k − 1). Here, with some abuse of notation,Ē i,[l] ∈ R gi×n l defines the block of row matrixĒ i corresponding to S l .
For what concerns the variance Σ i (k) . = Var(ρ i (k −1|k)), from the definition of Var (ρ i ) and (15) it follows that: where the covariance terms satisfy Cov( j ,ρ i ) = 0, since the estimator error i is independent of neighboring states by construction, for all subsystems S i , i ∈ N .
It is important to recall that since x a i (k) is deterministic, it will not influence the variance of either the estimation error or the residual. Hence, we focus on the estimation error mean. Also note that the residual variance Σ i (k) can be computed locally at subsystem S i , provided that the neighbors' process and measurement covariance matrices W j and V j , and models (A j , C j , G j ) are known to D i .

VI. DETECTION STRATEGY
In this section we exploit the known statistical properties of the residualρ i , to design a statistical test apt at raising an alarm when suitable conditions are satisfied.
We consider a residual sequence of finite length ω i , containing samples ofρ i (k − 1|k) from k − ω i + 1 to k: The following composite hypothesis test can be formulated. The null hypothesis H 0 i represents the healthy case when no subsystem S j , j ∈ N i is under attack, whereas the alternative hypothesis H 1 i holds otherwise. Problem 2 (Covert Attack Detection): The detection logic in D i accepts one of the following hypotheses: given the estimation residualρ i (k − 1|k) defined in (14). Again, the superscript h denotes the component not affected by the attack, ζ a i,l (k) is considered to be unknown, andρ h i follows the statistic properties in (16). Proposition 4: If M i (k) is defined according to Lemma 1, and V i > 0, then matrix Σ i (k) is invertible for all k ≥ 0.
Problem 2 is equivalent to detecting an unknown signal embedded in white Gaussian noise, and as such a solution can be found by means of a Generalized Likelihood Ratio test (see for instance [20]). Hypothesis H 1 i is accepted when is greater than a threshold to be defined, whereζ a i,l is a maximum likelihood estimate of ζ a i,l . Because of whiteness 1|k) is such an estimate. Let us define the statistic T (ρ i , k) as the logarithm of (19) and θ i (k) as a detection threshold; then we obtain the following detection test: where it is sufficient for any component of (20) to satisfy the inequality for detection to occur. The probabilities of false alarm and detection are defined as the following: Since Σ i (k) is symmetric positive definite it is possible to can be defined, where the components ofẑ i are mutually uncorrelated and each q-th component has variance λ i[q] (k). Therefore, for q ∈ [1, g i ], we have that, for thresholdθ i[q] (k): Sinceẑ i is linearly related toρ i , and in light of (17), T (ẑ i[q] , k) follows the distribution: where χ 2 k (ν q ) is a chi-squared distribution with degree of freedom ω i and non-centrality parameter where U i[q] (k) denotes the q-th row of matrix U i (k). Let us define the tail probability of the normalized χ 2 distribution as Φ(u) . = 1 − Pr{T (ẑ i[q] , k) < u}. Then, it is possible to compute the probabilities in (21) for each component q as: Remark 5: Note that T (ẑ i[q] , k) represents the energy of the attack received by S i . From (25b) it can be seen that the probability of detection decreases as the attack energy decreases. Furthermore, as ν q → 0, the probability of detection approaches that of false alarm. More precisely, ν q depends on the energy of the attacked state x a l as scaled by the corresponding interconnection weight.
Also, note that the presence of the input estimate variance λ i[q] (k) reduces the effect of the attack on ν q .
Eqs. (25a) and (25b) hold component-wise. It is possible to find an expression for the probability of false alarm P f i (k) of detector D i by observing that the probability of at least one false alarm is the complementary to the probability of no false alarms. Thus, recalling that the components ofẑ i are independent by construction, we have that: If we assume the same probability P f i[q] (k) for each component q, it is possible to invert (26) and (25a). This allows to compute individual thresholdsθ i[q] , given a desired cumulative probability P f i (k). The overall probability of detection can be found in the same way, although it depends on ν q .

VII. NUMERICAL SIMULATIONS A. Simulation setup
We consider a CPS composed of N = 4 subsystems, interconnected as in Figure 1. We consider the linearized model of multiple pendula coupled through a spring, as presented in [21,Ex. 1.36], where each subsystem is described by: where δ i , m i , l i are respectively the displacement angle, mass, and length of the pendulum; g is the gravitational constant; k ij is the spring coefficient, with k ij = k ji , and a i is the height at which the spring is attached to pendulum i.
The parameter values used in the numerical simulation can be found in Table I.
, and defining a decentralized state feedback control law u i . = K i x i , we discretize the pendulum's dynamics subsystem-by-subsystem with Euler's approximation with sampling time T s = 0.01 s, preserving the topology and the interconnection structure of the CPS. For all subsystems, we assume that all states are measurable, i.e. C i = I 2 . The process and measurement noise variance matrices are W i = 10 −3 I and V i = 10 −3 I. We run the simulation for 100 s.  From (27), and considering the Euler approximation for the discretization of each subsystem, it is possible to choose G i = [0, 1] . Note that ξ i ∈ R, for all subsystems.

B. Attack scenario and detection
Starting from time k a = 35 s, an attacker is able to inject  Figure 2, we show the effectiveness of our detection technique, by comparing the statistic T (ẑ i , k), computed by using a window of size ω i = 20, to the thresholdθ i (k), defined for all subsystems such that the probability of false alarm P f i = 0.05. At time k = 36.78 s, detector D 2 detects the presence of an attack in N 2 , while D 4 detects the attack in N 4 at time k = 37.16 s. As expected, the diagnosers for subsystems S 1 and S 3 do not detect an attack.

VIII. CONCLUDING REMARKS
We have proposed a distributed method capable of detecting local covert attacks in interconnected CPSs with stochastic uncertainties. The proposed method is based on the joint estimation of local states and the neighbors' cumulative effect; communication among subsystems enables definition of a suitable residual signal and a related statistical test.
Future work will include studying additional detectability properties of the proposed approach and comparison to other techniques for solving Problem 2, as well as investigation into the architecture's robustness to other types of attacks.