IoT Network Attack Detection and Mitigation

Cyberattacks on the Internet of Things (IoT) can cause major economic and physical damage, and disrupt production lines, manufacturing processes, supply chains, impact the physical safety of vehicles, and damage the health of human beings. Thus we describe and evaluate a distributed and robust attack detection and mitigation system for network environments where communicating decision agents use Graph Neural Networks to provide attack alerts. We also present an attack mitigation system that uses a Reinforcement Learning driven Software Defined Network to process the alerts generated by the attack detection sysem, together with Quality of Service measurements, so as to re-route sensitive traffic away from compromised network paths using. Experimental results illustrate both the detection and re-routing scheme.


I. INTRODUCTION
The IoT [1] has the potential to improve the critical processes that are at the heart of our socio-economic systems [2], [3]. However, it creates raises risks that go way beyond the individal technologies such as the Internet, wireless networks and machine to machine systems [4], [5]. In addition to risks related to system malfunctions [6], quality of service (QoS) failures, and excessive energy consumption, the theft and tampering of data, conventional network attacks and attacks that deplete the energy of autonomous sensors and actuators also need to be considered [7]- [13]. Since IoT devices can carry out real-time measurements and controls much faster than human reaction times, we must design IoT networks that both detect and mitigate security risks automatically and adaptively, while preserving Quality of Service (QoS), and energy efficiency [6], [14]. Thus we propose an autonomic [15] scheme offering (a) distributed attack detection based on deep learning (DL) and graph neural networks to achieve high detection probabilities with low false alarm rates [16], [17], and (b) mitigation that exploits network Self-Awareness [18], [19] centered on Software Defined Networks [20] to achieve secure QoS based routing of traffic flows using machine learning and adaptivity [21], [22].
Thus Section II discusses a multi-agent system (MAS) for network attack detection, and summarizes its performance. The overall system architecture for attack detection and mitigation is presented in Section III. The node attack detection probability estimated by MAS is used to compute safer paths in the network using reinforcement learning as described in Sections III-A and III-B. Experimental results are presented in Section III-C, and Section IV presents conclusions and future work.

II. DISTRIBUTED ATTACK DETECTION
IoT systems are distributed have a heterogeneous structure which is an additional challenge for real-time anomaly detection [23], [24]. Thus the distributed MAS for detecting attacks monitors the network traffic in a distributed manner, and outputs to the novel routing system described in Section III, to mitigats attacks with a SDN based routing engine. The MAS's mutually communicating multiple agents can improve its robustness by incorporating redundancy in the detection algorithm [25]. The MAS also offers scalability, since its modularity allows new agents to be added if the IoT network grows, and agents exchange information [26] in a structure inspired by Graph Neural Networks [16], [17].
The structure of the IoT network is reflected by the graph G(V, E), where V corresponds to the set of nodes of the IoT network, and E ⊂ V × V is a set of edges which represent the nodes which communicate (directly or indirectly) with each other through the IoT network. The nodes can represent sensors or actuators, edge nodes, servers or routers in the IoT network. We associate a real-valued feature vector x i ∈ R N V to each i ∈ V , where N V is its length. Similarly we associate the feature N E -vector of real numbers e ij ∈ R N E with each edge (i, j) ∈ E. An example of the features for the nodes and edges is given in Table I. Measurements that collect the feature vector parameters are taken in the IoT network during successive time slots [(t − 1)T, tT where T is the slot length and t is the slot index. The slots are long enough to provide representative data, but short enough to reflect time variations in the system. Thus all feature vectors are also associated with individual slots and successive values. Thus x t,k i the k − th successive value of x i within the t − th slot, while e t,k ij is the k − th successive value of e ij in the t − th slot. We will denote by e t ij and X t i , respectively, the feature vector values at the end of the t − th slot, while e 0 ij and X 0 i are their values when the measurement system starts to operate and the first slot begins. The MAS uses four Deep Neural Networks (DNNs): • The EDNN (edge DNN) which undertakes the update: EDNN uses an edge's current features, and the features of the two nodes at its edges, to update its features.
• The NDNN (node DNN) which undertakes the update: and updates a given node's features using the average value of the features of the nodes with which it communicates and of the related edges, with: where m j = |{i s.t. (i, j) ∈ E}| is the number of neighbours of j ∈ V , and s.t. stands for "such that".
• The third DNN, CLN builds p k,t N i , the probability that node i is compromised, only using its own feature vector: Finally the fourth DNN, CLNEI builds p k,t N ij the probability that i determines that its neighbour j is compromised: These four networks constitute the node agent, and are duplicated in each node, and can be trained off-line. They operate in each node separately and asynchronously. Starting from feature vectors from data gathered during the previous time slot, they update the decision probabilities and communicate their updated feature values and decison probabilities to their neighbours. These computations are shown schematically in    The mutiple iterations of these operations represented by the integer k, allow nodes and edges to update and exchange information multiple times within each time interval as shown in Figure 2. The entire network of nodes is trained in a supervised manner using the back-propagation algorithm with cross-entropy as the cost function for classification.
In the system that we have described, each node agent can perform anomaly detection not only on itself, but also with regard to its neighbouring nodes. This redundancy improves the algorithm's robustness in cases where some agents may fail, since the agents at neighbouring nodes may still detect anomalies that occur at their neighbours. Finally, to combine the overlapping decisions of different agents into a single decision for each node in the IoT network, we use a simple aggregation method, where a node is considered anomalous if at least one agent has reported it as being anomalous. More sophisticated aggregation schemes can be considered in future work.
To evaluate the proposed approach, we have uses a simulated infiltration attack, where the attacker tries to infiltrate the network by scanning a range of IP addresses in order to run services, and performs a dictionary attack in order to find vulnerable IoT devices. The resulting Receiver Operating Characteristic (ROC) is shown in Figure 3. The overall results are summarized in Table II. The metrics used forthe evaluation are the Area Under the Curve (AUC) score, the detection accuracy, the utilized Bandwidth, and the Power consumption [27]. For the first two metrics, which measure detection efficiency, the proposed approach outperforms all other methods we have tested for anomaly detection, achieving Area Under the Curve (AUC) score and accuracy of 0.99, compared to 0.97 for the second (random forest) and third (decision tree) classifiers. With respect to the last two metrics, i.e. Bandwidth and Power consumption, the proposed decentralized approach greatly reduces the bandwidth required for monitoring, which in turn reduces the power that is consumed. However, the execution time and the power consumed at each node will determine the energy consumed by our approach, so that a slower low power approach may consume more energy than a very fast method that uses higher power.

III. SYSTEM ARCHITECTURE AND ROUTING ENGINE
The Architecture of the SerIoT system is shown in Figure  4, with interconnected smart forwarding engines (SFE) that are connected to sfixed or mobile IoT devices, IoT gateways, and to Cloud Servers which may also be Fog servers. SFEs may be connected to Honeypots (H) whose role is to attract and interpret attacks. Specific sofware at IoT devices and gateways may be used to detect attacks [28], but here we us the decisions provided by the distributed attack detector of Section II. QoS, Energy and Security are monitored and forwarded to "smart controllers and routing engines" (SRE) which operate as OpenFlow SDN Controllers to choose paths and download the to the SFEs [29], [30].
The SRE uses the Cognitive Packet Network (CPN) routing algorithm [31], implemented with the Random Neural Network (RNN) and Reinforcement Learning [32] which has also attracted interest from industry [33]. It extends a standard SDN network using the SRE, with SFEs which are extensions of SDN forwarders, and the Monitoring and Anomaly Detection (MAD) module which detects potential threats from data collected by SerCPN, Active Honeypots that attract attacks, deflect  them to safe IP locations, and inform the SRE, and local attack detectors at nodes and gateways [28].
Each SFE, shown schematically in Figure 5, switches SDN flows according to the OpenFlow protocol. In addition to payload traffic, SFEs also forward smart packets (SPs) which gather security, QoS and energy usage data from the SFEs, IoT devices and gateways. Each SFE has a Cognitive Packet Agent (CPA) that unpacks the SP, adds its own data to the list stored inside, packs it again and forwards it to the SFE. SPs travel over paths, carrying information provided bythe SFEs on the path. When a CPA recognizes that a SP has attained the end of its path, it encapsulates it and forwards it to the corresponding SRE, where its data is unloaded into the local Network State Database (NetStatDB). SFEs can also forward data that is monitored, such as packet counters or byte counters) to the MAD at the SRE.
Each SRE is based on ONOS [34] and its software implemented as an ONOS application with the three main modules shown in Figure 6. The heart of the system is the Cognitive Routing Module (CRM) that implements decision taken by a RNN [35] with Reinforcement Learning for path selection based on QoS, security or energy consumption in the network. The MAD detects attacks at nodes using MAS of Section II. Other attack detection methods will also be considered in futire work [28].
The SRE selects paths based on a Goal Function G(f, P ) which has non-negative real values and which must be minimized, where f denotes the packet flow to or from an IoT device or end-user software, and P denotes a path travelled by and the df , and the MAS in Section II provides the probability p i that node i is under attack. For some SFE or network node i, the Trust Level T (f, i) is non-negative number that is high when i is not deemed secure enough to convey the flow secure f . Also, S(f, i) is defined as the sensitivity of f to attacks at node i.

A. Linking T (f, i) to the p i from MAS
Let A > 0 be a large positive constant used so that T (f, i) may take values comparable to QoS values such as the delay of links, and p i is the probability that an attack is detected at node i by MAS. Then T (f, i) = A.(1 − p i ) is the security level of f related to node i. Let S(f, i) be the sensitivity of f to the security of e. The Insecurity Factor I(f, i) is then used to "separate" e and f : where use the notation [X] + = X if X > 0, and [X] + = 0 if X < 0. If we take S(f, i) = A, then I(f, i) = A.p d (i). and we see that as p i increases, the "security cost" incurred by f as it travels through i increases. the "Insecurity Factor" that relates flows to paths, is: When less attention is paid to security, we may take the smaller value I(f, P ) = max i∈P I(f, i).
Let L(f, p) be the packet loss ratio, and D(f, P ) be the forwarding delay for a packet of f on path P , while J i is the energy consumption per packet at node i. The packet retransmissions due to packet losses [31], [36] result in: where θ ≥ 0 is a security threshold that can be chosen based on the importance of security considerations for this system. G(f, i) or G(f, P ) are quantities to be minimized, but Reinforcement Leaning (RL) requires a "reward" R(f, i) that should be maximized, where: , R(f, P ) = 1 G(f, P ) .

B. Reinforcement Learning
The metrics that feed into the quantity R(f, i) arecollected via measurements, except the ones that are initially fixed, namely θ, S(f, i) and the parameters such as α, β, γ describing the relative importance of different factors. Therefore the RL based routing scheme to improve network security, QoS and energy consumption, collects at each node i the quantity G(f, i) and hence R(f, i), at successive arrivals of a SP packet to an SDN controller. The SP will collect bring back the relevant data for R(f, i) concerning each node i that the SP has visited, to the SDN router that exploits the RL algorithm to compute a "next hop" for SPs. Let the integer l refer to the l − th value of the reward R l (e, f ) computed by the SDN router for the node i and flow f . The RL algorithm will first compute the quantity: that describes the historical behaviour of the reward, and tells how well the network has been doing. The RL algorithm will then compute a set of RNN [35] weights as follows.
For an N node RNN, where N is the number of outgoing links for node i, we associate with each outgoin link i a neuron whose state is represented by the "excitation probability" q i of the RNN. The RNN weights are real numbers W + ij , W − ij ≥ 0 for i, j ∈ {1, ... , N }. From RNN theory [35] we know that: where is the "total firing rate" of the neuron j. λ + j , λ − j are, respectively, the arrival rate of excitatory and inhibitory spikes to neuron j from outside the neuron i, which are set so that when all connection weights are equal, then all neurons in the network have an excitation probability of q j = 0.5.
Let k be the index of the neuron for which, after the v − 1-th update of the RNN we have q k = max{q 1 , ... q N }. Also save the current value r j ← N l=1 [W + jl + W − jl ]. Note that the node from which a SP entered the node where the next-hop decision is being taken will not be used as the next-hop, so that the decision at a given node will select one outgoing link among N − 1. The RNN's weights are updated as follows: If R l ≥ T l−1 : ∀ j = k, j = i(P revious), i = k, If R l < T l−1 : ∀ j = k, j = i(P revious), i = k, where we divide by N −2 since we are excluding i(P revious) from which the SP initially arrived, also not increasing the inhibitory weights of the winner node when R l ≥ T l−1 , nor increasing the excitatory weights of the loser node when R l < T l−1 . We then also renormalize the weights as follows: Finally we calculate all the q j from equation (12), to select the new output link for flow f at node i by selecting the new output link k * with q k * = max{q j , j = i, 1 ≤ j ≤ N }.

C. Experimental Results
Experiments were run on a network with several with SFEs composed of Linux boxes with ARMv8 processors (1,4 GHz, 4 GBit Ethernet, 2.4GHz and 5GHz 802.11b/g/n/ac WiFi interface). They were configured to use Ethernet as the data plane interface shown in Figure 7, and WiFi for management, monitoring and for communications with the SRE. In the figure s1, ... , s7 denote SFEs, h1, ... , h4 are IoT devices each with a 633MHz MIPS processor, 100Mb/s Ethernet port, and 2.4Mhz WiFi connection used as a management port, the SRE is a workstation connected by WiFi to the test-bed, and the MAD is installed on a separate workstation connected by Ethernet port to the SRE, and by WiFi to the test-bed. The type of experiments we run are represented by the measured event trace shown in Figure 8.
In our experiments, every distinct pair of IoT devices in {h1, ... , h4} forward 20 packets/sec or roughly 20 − 40 Kb/sec with 12 ongoing connections so that each packet rate is compatible with IoT connections monitoring temperature or water flow in pipes, etc. SPs are generated by every edge SFE at 10 packets/sec. SRE management traffic includes OpenFlow commands, link and topology discovery packets, and traffic statistics. Management packet traffic through SFEs measured using the Wireshark packet analyzer was four to five times higher than SP traffic.
The experiments illustrate the system's aptitude to be Self-Aware and adapt, and we measure the SRE's reaction time to abrupt changes in the security conditions expressed by the trust level for connections, and track changes to parameters R l and T l−1 given in (11) for the Reinforcement Learning Algorithm's successive steps l. The SRE was programmed to change network paths every 5 seconds, so that the experimental results we present are limited by this constraint that has been placed to avoid frequent changes that may increase system overheads. The effect of changing the trust T F (., .) is shown in Figure 8. The quantity that is plotted is the proportion of the time it takes the SRE to respond to a large increase of 100 in the value of T F (f, i) for a node i on the path that is currently used. We see that the reaction tie is on average around 1 second, waitha maximum value around 2 seconds.

IV. CONCLUSIONS AND FUTURE WORK
In this paper we have described a system that detects node attacks in an IoT network using a deep learning based Mulltiple Agent System, and exploits attack detection in order to automatically mitigate the attacks by re-routing sensitive traffic using Reinforcement Learning, while also taking into consideration the QoS of different network paths. We have also provided a preliminary evaluations of the performance of both the attack detection and mitigation system. In future research, additional measurements, fine tuning of parameters, and experiments will be conducted to better evaluate the interaction of QoS and security in complex adaptive IoT networks. Using methods from diffusion processes [37], [38], we will investigate the transients due to SDN based frequent route updates, in response to potential attacks and changes in QoS. We will also test locally operating anomaly and attack detection software at nodes to reduce computation times for anomaly detection, and improve the response of the system to QoS and security changes, while possibly reducing the accuracy offered by the proposed network-wide anomaly detection scheme.