Differentiating Attacks and Faults in Energy Aware Smart Home System using Supervised Machine Learning

The topics of fault diagnosis and security attack diagnosis in Cyber Physical Systems (CPS) have been studied extensively in a stand-alone manner. However, considering the co-existence of both of these sources of abnormality in a system, and being able to distinguish among them, is an important and timely problem not currently addressed in the literature. In this paper, we study the internal communication environment of an Energy Aware Smart Home (EASH) system. We formally define the problem of differentiating component attacks from component failures in EASH and we provide a methodology based on supervised Machine Learning (ML) algorithms to differentiate between a set of common attacks and faults on the communication channel. To evaluate our approach, we provide experimental results obtained by a simulation framework as well as from a real-time testbed environment.


INTRODUCTION
Cyber Physical Systems (CPS) integrate physical processes with electronic computing devices and digital communication channels. There are two main causes that may affect the proper operation of a CPS: security attacks, and failures due to component malfunctions or deficiencies. As CPS play a vital role for the day-to-day operations of modern societies, they have become an appealing target of attacks by malicious actors [4], while their widespread use lead to a significant increase in the attack surface. At the same time, just like any other physical control system, different components of CPS can suffer benign failures [8].
Both failures and attacks lead to abnormal behaviour of the system, but their implications may differ greatly: a component failure may be controlled leading the system to a stable state until the component returns to an normal operating state; a component attack by a malicious actor may be designed to trigger a chain of possibly unnecessary control actions, causing severe outcomes to the system. Having the means of differentiating between faults and attacks allows CPS operators to choose the proper recovery actions that minimize the negative effects of abnormal behavior.
Specifying the parameters that may lead to such differentiation is not an easy task. In this work, we focus on an Energy Aware Smart Home (EASH) system. In an EASH system, sensor/computing nodes are connected and measure the (instant) energy consumption of home electronic devices/appliances. Those measurements are transmitted to a central entity (coordinator), via common wireless/wired protocols, that stores and analyses the overall energy consumption in the house. From the central coordinator the energy consumption is then reported to the energy utility by using the Advanced Metering Infrastructure (AMI), which enables a two-way communication between utility consumers (Smart Homes) and utilities (Smart Grids). This allows consumers to be aware of their home's energy footprint (energy aware), and make decisions on how to minimize their energy consumption. We examine how network's traffic characteristics in an EASH may relate to a family of failures and attacks, and how those characteristics may be used by algorithms with small information requirements to distinguish between the different abnormalities.
Traditional solutions for fault diagnosis in CPS are based on operator's experience [3], while more recent approaches [6] utilize sensor and alarm data, characterizing the new era of Internet of Things (IoT). IoT solutions for fault diagnosis combine both machine learning approaches and human expertise. For instance, fault diagnosis in power and smart grid systems utilize artificial neural network which are adaptive systems inspired by biological systems. The most common techniques under the artificial neural networks are Radial Basis Function (RBF) [7] and Support Vector Machines (SVM) [11].
Apart from fault diagnosis, security concerns for CPS have been studied in the area of systems fault detection, isolation, and recovery [5], where security attacks are modeled as extreme fault cases. Some unique characteristics and vulnerabilities of CPS lead to the need of new, more appropriate, detection and identification techniques. Moreover, as mentioned in [2], traditional IT security methods are not applicable in the case of CPS security. More specifically, the resilience on communication protocols for the transmission of measurements and control packets increases the possibility of attacks that can affect the physical plants [4]. Moreover, existing information security methods as for example authentication, access control, information integrity do not suffice for the protection of CPS.
Research in cyber security and fault diagnosis, affecting EASH as part of the Smart Grid, is mainly focused on detection mechanisms [9,12]. Such mechanisms are able to identify when a system's behaviour deviates from normal, but is unable to identify the source of abnormality. The normal behaviour of the system can be described either using a state estimator used in Anomaly Detection [12], or historical data used in intrusion detection mechanisms. But while existing studies deal with the detection and identification of attacks in the Smart Home [10], there is no work that takes into account both malicious attacks and faults on communication channel.
In this paper we study whether it is possible to differentiate component faults, which alter the system communication, from network attacks by examining the traffic on the communication channel. We provide a formal definition for the problem under consideration and propose a general methodology which can be utilized for differentiating a common set of attacks and faults. The proposed methodology uses supervised Machine Learning (ML) algorithms, which utilizes data generated during communication. Our proposed approach has been evaluated through simulation and a real time use case.

PROBLEM FORMULATION
We consider an EASH system which comprises of multiple monitoring computing devices (nodes), each communicating directly to a central hub (coordinator node). Our goal is to design a methodology to differentiate faults from attacks in such system. More formally, we model the system as a graph G = (V , E) composed of a set V of m processing nodes {q 1 , . . . , q m } and a coordinator node q c . Each q i ∈ V \ {q c } processing node reports its messages directly to the coordinator node. Thus, a star topology exists where (q i , q c ) ∈ E, ∀q i ∈ {q 1 , . . . , q m }.
Two nodes q i , q j ∈ V communicate by exchanging messages via synchronous, reliable, communication channels if (q i , q j ) ∈ E. We use e i, j as a shorthand to denote the channel between q i and q j , given that (q i , q j ) ∈ E.
For each q i ∈ {q 1 , . . . , q m }, we define a single binary connectivity variable v to indicate the direct connection of q i to the coordinator: The state of each node q i ∈ V , denoted as σ i , is defined over a set of local state variables B i , and the binary connectivity variable q i .v. Similarly, the state of each channel e i, j , is defined over a set of channel measurements C i, j , that describe the channel's conditions and performance metrics (e.g., delay). The state of the system σ is a vector that contains the state of every node q i ∈ V and every channel e i, j for (q i , q j ) ∈ E. Each node q i ∈ V implements a set of actions. When an action α occurs at a node q i it causes the state of q i , and thus the state of the system, to change. An execution ξ of the system is an alternative sequence of states and actions, where actions send and receive events that may change the state of the channel or the state of a node in the system. We assume that the system executes in discrete time units. Thus, σ 0 is the initial state of the system at time t = 0, and σ t denotes the state of the system at time t in an execution ξ . We assume that each execution starts with the initial state and ends with a state. If σ t is the ending state of an execution ξ then we say that t is the length of ξ and is denoted by T (ξ ) = t.
Let (q i .b j ) t denote the value of the local variable b j of node q i at a time unit t. Similarly, we denote by (q i .v) t the binary connectivity variable of node q i and (e i, j .m k ) t the value of the channel characteristic m k at time t. Given this notation we can define the states of each node, each channel, and the state of the system as a whole at time t: the expected low and upper bounds respectively, of the value of q i .b j , at a time unit t with respect to the previous state of the system σ t −1 . Using, these bounds we can now define normality and abnormality in the system as follows: In other words a state of a node is normal if all the state variables are within the expected bounds, and the node is connected and communicating with the central node (i.e., (q i .v) t = 1). On the contrary in an abnormal state such conditions may not hold as explained in Definition 2.2.
Definition 2.2. A state σ i t of node q i in an execution ξ is abnormal at time t if at least one of the following holds: • abnormal node behavior iff ∃j q i .b j ∈ B i such that: • abnormal node connectivity iff: By Definition 2.2, an abnormal state may include variables with unexpected values, and/or the direct connection of the node with the coordinator may be interrupted (i.e., (q i .v) t = 0). We assume that connectivity interruption may be caused by an external event (e.g, man-in-the-middle attack, or interruption attack) without necessarily the sending node be aware of this.
Following Definitions 2.1 and 2.2 we say that the system state σ t is normal if ∀q i ∈ V , σ i t is normal. Similarly we characterize a system state σ t as abnormal if ∃q i such that σ i t is either an abnormal node behavior or abnormal node connectivity. We define an execution ξ as Normal-execution if ∀σ t ∈ ξ σ t is normal, for 0 ≤ t ≤ T (ξ ). Similarly we say that an execution ξ is ψ -execution, for ψ ∈ [Fault, Attack], if ∃σ t ∈ ξ such that σ t is an abnormal state, resulting from a fault or attack respectively, for 0 ≤ t ≤ T (ξ ). So given an algorithm A, we can define the differentiation operator, and thus the differentiation problem, as follows.

Problem 2.1 (Class Differentiation). Specify a differentiation operator
A using a (machine learning) algorithm A such that, given two class state snapshot vectors w ψ t , w ψ * t * from executions ξ and ξ * (not necessarily different):

PROPOSED METHODOLOGY
The proposed methodology is presented in Figure 1. Our methodology, includes three distinct steps. First, the internal communication environment is modelled for the normal, faulty and attack classes. Then, we define a set of execution scenarios that allow us to generate execution datasets that describe the behaviour of the system under normal, faulty, and attack classes. This step is done for each of the three classes. Finally, at step 3 the generated datasets are used for the evaluation of different supervised machine learning algorithms for classification purposes. Our methodology is general and accommodates different classes of faults and attacks. Without loss of generality, in Section 3.1 we discuss the specific classes we consider in this work.

Execution Classes
Here we define the three execution classes, based on the definitions given in Section 2, and provide the types of faults and attacks we consider in this work. Faults in an EASH system, often appear in the sensing nodes rather than in the communication channels between the nodes and the coordinator. On the other hand, attacks are often seen to be launched against the communication channels, while also affecting the state variables (e.g., routing tables) of the communicating nodes. Therefore, the fault and attack types we choose: (i) resemble common abnormalities in EASH, and (ii) have a similar effect in order to challenge the differentiation task. Our differentiation mechanism examines the network's traffic characteristics, in order to separate common sets of faults from attacks. Normal Class: In normal behaviour we consider Normal-executions, where nodes capture measurements and transmit packets to the central node with no attacks or faults. Faulty Class: We consider Fault-executions where a system state σ t is a faulty state if ∃q i ∈ V with σ i t being an abnormal node behaviour state, but not an abnormal node connectivity state (see Definition 2.2). In other words, a node may experience a fault that affects its local state variables (e.g., measurements, routing tables), but its connectivity to the central node is not interrupted. Such failures may also cause routing failures (due to routing table corruption) or packet drops (due to improper generation of the local packets). We use three types of faults in this class (forming set F): (F1) Low Energy Failure, (F2) Routing Failure, and (F3) Packet Dropped Failure. Attack Class: In our experiments we assume attacks that affect the communication channels. In particular, we consider Attack-executions where the system state σ t is an attack state if ∃q i ∈ V and σ i t being both an abnormal node behaviour and abnormal node connectivity state. Man-in-the-middle (MITM) attacks posses such characteristics, as the attacker tries to modify local variables of the communicating parties in order to gain access on the direct connectivity between them. Common attacks performed by MITM are message modification, replay and sink hole. So we use the following types of attacks from this class (forming set A): (A1) Replay Attack, (A2) Sink Hole Attack, (A3) Message Modification Attack.

ML Classification Algorithms
We perform an exploration for different ML algorithms for classification, focusing on the ones considered more appropriate for the underlying problem. The execution classes are used to generate values for descriptive measurements of the network's behaviour. For the evaluation of the machine learning classification algorithms, the Waikato Environment for Knowledge Analysis (WEKA) tool is used [1]. Algorithm selection for evaluation was based on the following characteristics of our collected dataset: (i) small, (ii) static and (iii) contain nominal attributes. In particular, small datasets may face the problem of over fitting. Furthermore, static datasets cannot be used in online learning algorithms. We use the same algorithms in both test bed and simulation experiments for comparison reasons, despite the fact that the dataset generated in our real-time testbed is larger. Given our dataset characteristics, we evaluate the following four supervised ML algorithms: J48: A tree classification algorithm that uses a pruned or unpruned C4.5 decision tree for classification. NaiveBayes (NB): One of the most common classification algorithms that relies on Bayes' theorem using estimator classes. MultiLayered Perceptron (MLP): Multilayered perceptron in neural network class. Multinomial Logistic Regression (MLR): A classifier that builds and classifies instances using ridge estimator.

Evaluation Metrics
The evaluation metrics used for the ML classification algorithms are: the observed accuracy (p o ), and the kappa statistic value (κ). Observed accuracy was selected as it is the most commonly used metric in such cases of evaluation and it is a percentage (%) metric. The kappa statistic value was selected because it is a metric that takes into consideration unbalancing in the number of instances per class; it is a normalized value between zero and one. In our case, unbalancing is observed since the normal class appears more frequently in a monitoring system, and thus generates more instances in our collected dataset. Hence, the kappa statistic value is computed, based on the observed accuracy (p o ) and expected accuracy (p e ), by κ =  To explain how p o and p e are computed, we use a simple example with two classes, as given in Table 1. Here, values a,d are the correct classification instances between the actual and predicted classes, and values b,c are the miss-classified instances for this example. The computation of p o and p e is given by:

SIMULATION IMPLEMENTATION AND RESULTS
The internal communication environment of EASH was simulated using the OPNET network simulator, considering the ZigBee communication protocol. Scenarios for normal, faulty, and attack executions, including all failures and attacks presented in Section 3.1, were simulated. In particular, F1 was simulated by lowering a single source energy below transmission threshold; F2 was simulated by changing the destination of each packet to take a random value; F3 was simulated by adding noise to the packets which causes the transmitted data to be invalid. rendering the data load invalid and causing the receiver to drop the packets. Attack classes were simulated by entering a middle node interrupting communication between a single peripheral node and the central coordinator.
In A1 the attacker is re-transmitting (without modifying) the packets received from the transmitting node. In A2 we just drop the packets transmitted, and in A3 we increase the payload of the packet before re-transmitting it to the central coordinator. From the simulated scenarios we extract a total of twenty five network characteristics (s.t., throughput, packet delays and data transmitted), to build comma separate instances each associated with the execution class (i.e., normal, fault, or attack) that was used to generate the particular instance. The set of those instances yield our complete dataset. A total of 144 instances were generated after the execution of the simulated scenarios. In our evaluation, a 75% -25% split was used for training and testing, respectively, among the entire dataset, in both the simulation and The results are reported in Table 2. For each ML algorithm considered, we list the observed accuracy and kappa statistic value as a pair (p o , k). As it can be observed, results are promising as in all four algorithms the average accuracy is above 85%. Algorithm J48 slightly outperforms the other algorithms. Additional fine tuning of the algorithm parameters and/or examination of additional features (a total of 26 features were examined in our simulations) could be examined to further improve the observed accuracy.   Based on the Problem definition 2.1, we evaluate the class differentiation for each algorithm. Table 3 presents these results for the J48 algorithm for the particular case #1. A T appears in a cell if the algorithm can differentiate the two classes corresponding on the row and column of the cell, and F if the algorithm failed to differentiate the two classes. Differentiation does not apply for states of the same class (in the diagonal, denoted by -). For J48 and case #1, all instances were classified correctly except for the Low Energy Fault (F1) instances which were misclassified as Packet Dropped Failure (F3) instances, and vice-versa. In a similar manner, we have constructed tables for each of the algorithms and cases considered, which are omitted here due to space limitations.

TESTBED IMPLEMENTATION AND RESULTS
We run experiments in a real-time testbed consisting of real sensing units. In particular, we used three Raspberry Pi 3 Model B+ connected via Bluetooth to three Sensor Tag CC2650 for sensing humidity, temperature values (peripheral nodes), a Macbook Pro for collecting the transmitted values and running a Wireshark sniffing tool (central node), and an Ubuntu PC to issue the man in the middle attacks (attacker). We formed a star-based topology, as it was introduced in section 2.
In this real-time testbed we implemented the fault classes F1 and F3 and the attack classes A2 and A3. For the normal case, peripheral nodes (formed by RPis) received humidity and temperature and generated TCP packets that were transmitted to the central node.
F1 was emulated by interrupting a single peripheral node communication by cutting off its power. F3 was emulated by setting a threshold on the packet generation procedure indicating the quality of packets being generated. If the value of a generated packet was above the threshold, then the packet was transmitted; otherwise the packet was dropped. To implement the attack classes, we initiated a Man In the Middle Attack (MITM), where the attacker interrupts the communication between a peripheral and the central node by receiving the packets of their communication. This attack was implemented using the Address Resolution Protocol (ARP) spoofing (poisoning) method that allows the attacker to alter the ARP tables on the communicating nodes by sending (spoofed) ARP messages. The aim of this attack is to associate the attacker's MAC address with the IP address of another host. In our implementation the attacker associate its MAC address with the default gateway of the network. This attack allows the attacker to intercept the data frames on the network, to modify the traffic, or stop all traffic entirely. By intercepting the channel between a peripheral and the central node we can emulate the behaviour of A2, by stopping all the traffic in the network, and of A3, by modifying the measurements in the packets transmitted. Using Wireshark, we collected a total of twenty four characteristics (e.g, time sent, origin, destination, protocol etc.) for each message sent, forming comma separated instances for our dataset. Each instance was associated with the execution class emulated at the time the packet was captured. The total amount of instances we collected for the real time dataset was 292. Experiments were performed for the following cases: Case # 1: Normal Vs Every Fault (N vs F1 F3) Case # 2: Normal Vs Every Attack (N vs A2 A3) Case # 3: Normal Vs Every Attack Vs Every Fault (N vs F1 F3 A2 A3). Evaluation matrix results is shown below, followed by differentiation operator notation results similarly to Table 5. Differentiation operator results are captured using the same classification algorithm and case, namely J48 and Case #1.   Based on Table 4, we can see that results are promising as in all four algorithms the average accuracy is above 85% with algorithm MLP being slightly better. Based on Table 5 we can observe that, Low Energy Fault (F1) instances are missclassified as Packet Dropped Failures (F3), and vice-versa. As before the rest of the cases are omitted.
An important finding in Case #3 lead us to further examine classes F1 and A2. These classes have a similar effect, as in both, packets stop being delivered from a peripheral node to the coordinator. To this end, we considered case NF1A2, consisted of N, F1 and A2 instances. During this experiment, both TCP and ARP packet traffic was captured. The results using TCP only or TCP & ARP packets are shown below:  As shown on Table 6, the results are further improved with the additional information of the ARP traffic, as the injection of ARP packets during the MITM attack in A2 is captured in the second case, allowing our algorithms to differentiate the two classes.

CONCLUSION
This paper presents preliminary results on the differentiation task between a set of common failures and attacks, in an EASH system. Results show that the use of supervised machine learning algorithms is a promising approach as it achieves to differentiate with high accuracy rate between faults and attacks. Some misclassifications for cases with similar impact on the network show the need for future improvements of data generation and evaluation. In future work, we aim to expand the current work by examining correlations between faults and attacks and considering time series datasets.