MIRAI Botnet Attack Detection with Auto-Associative Dense Random Neural Network

Internet connected IoT devices have often been particularly vulnerable to Botnet attacks of the Mirai family in recent years. Thus we develop an attack detection scheme for Mirai Botnets, using the Auto-Associative Dense Random Neural Network that has recently been successful for other attacks such as the SYN attack. The resulting method is trained with normal traffic and tested with attack traffic, and shown to result in high accuracy detection of attacks with low false alarms. The approach is compared on the same data set with two other common Machine learning methods (Lasso and KNN) and shown to have higher accuracy, and much lower computation times than KNN and slightly higher (but comparable) computation times with respect to Lasso.


I. INTRODUCTION
The need to use large numbers of low cost and low maintenance devices for the IoT created vulnerabilities which dramatically manifested themselves in 2016, when a massive distributed denial of service (DDoS) attack took down large numbers of web sites including Spotify, Twitter, Reddit, Netflix, through the DNS service for domain name management [1], [2]. It also created malicious accesses from IP addresses numbering tens of millions, towards servers of some leading cyber-security companies [3].
Known as the Mirai ("future" in Japanese) Botnet, this form of DDoS attack sends TCP SYN requests to a large number of IP addresses, and if the victim responds it then uses the weak login credentials of many IoT devices based on default usernames and passwords initially set in the factories that produce the IoT devices, when these credentials are not changed after installation and connection to the Internet. If the attacker is successful, it installs malware at its victims; it blocks the victim's ports that used for updates and generates traffic to overwhelm other servers and devices with nonsense requests, also leading to threats and protection rackets [4], [5]. However Botnets can also target other critival infrastructures including smart vehicles [6] as well as the core Internet This research has been supported by the European Commission H2020 Program under the IoTAC Research and Innovation Action, under Grant Agreement No. 952684. itself and its data collection capabilities which are critical for managing the Internet in real-time [7].
If the attack is detected at a device,a rapid change of password and reboot would be needed, but the reboot itself can be hampered if the malware has blocked the victims ports that are used for maintenance and updates.
Different forms of Mirai such as "Satori" have been known to infect mainstream network routers, while the "Okiru" version has aimed at infecting popular processors for embedded systems such as PowerPC, MIPS, ARM, x86, PowerPC, and Linux devices such as the popular Argonaut RISC Core processor (ARC), all of which have literally been shipped at many billions of devices per year. Other variants such as "Masuda" and "Wicked" also have targeted routers, while many versions of Mirai have been observed to attack IoT devices, and machines equipped with the Linux operating system. and Android based mobile phones [8].
Thus detailed studies to understand the characteristics have been conducted on these attacks [9], [10], recent work has studied the characteristics of their attack traffic [11], [12] and blockchain has been suggested [13] to protect IoT devices against Botnets.
In this paper we develop a Mirai Botnet attack detection technique based on machine learning (ML) with a sspecific the Random Neural Network (RNN) architecture [14]- [16], called a Dense RNN [17] which uses tight clusters of spiking neuronal cells for deep learning. Such techniques have been previously used with success to detect SYN attacks [18]. Earlier work for video quality evaluation [19] and network design [20] have shown the effectiveness of the conventional RNN model [21] to address problems in communication systems. There has also been other work [22] where the RNN has been used to optimize IoT systems for home climate control. An extensions of the RNN has also successfully been used for modeling adaptation in Gene Regulatory Networks [23].
In [24] it was shown that the Dense RNN with an autoassociative learning algorithm provides more accurate SYN attack detection than some other lML models. Thus in this paper we also use the Dense RNNs for deep learning in auto-associative mode, and compare the outcome to several other ML techniques both for detection accuracy and speed -including the learning and detection computation times.

II. AUTO-ASSOCIATIVE DENSE RANDOM NEURAL NETWORK FOR ATTACK DETECTION
In this section we describe the attack detector which is based on the Auto-Associative Dense Random Neural Network (AA-Dense RNN-AD), whose architecture is shown in Figure 1. It is composed of three modules, namely Metric Extraction, Auto-Associative Dense RNN (AA-Dense RNN), and the Attack Decision Maker.
First, via a Metric Extraction Module we extract the relevqnt metrics of the attack packets in order to capture the footprints of Botnet attacks. Figure 1, shows the metrics x i k for the packet k. These metrics are determined in-advance, and the Metric Extraction Module only extracts them from the data packets of IoT traffic; i.e., this module does not select the metrics.
1) Selecting the Metrics: Recall that Mirai is a type of attack that spreads to IoT devices over the network. In addition, every device infected by Mirai generates additional traffic to cause a massive DDoS in the network and infect even more nodes, so that the resulting traffic pattern of infected devices, including the inter-transmission time characteristics. Thus, we intuitively know that, when a device is affected by the Mirai attack, it will increase the total size of the traffic to overload the network by generating more packets. Accordingly we select the candidates for the important metrics of MIRAI Botnet traffic as follows: • Metric 1: The total size of the last K transmitted packets, • Metric 2: The average inter-transmission times of the packets over the last K packets, (The inter-transmission time is the length of time separating the transmission of a packet from the transmission of the previous packet from same source.) • Metric 3: Total number of packets that are transmitted in a time window with a duration of T . We analyze the importance of each of the candidate metrics in Section III-C in order to determine an importance coefficient for each of these candidate metrics with respect to the effect on the detection of Mirai attacks.

A. Auto-Associative Dense RNN (AA-Dense RNN) Model
Next, we will describe an auto-associative network constructed via the Dense RNN model, which we call the AA-Dense RNN. The Dense RNN model was introduced in [17], [25] as a mathematical model for neural networks with soma-to-soma interactions. In addition to the usual interactions between axons and dendrites, this model allows a direct soma-to-soma connectivity, so that firing at a given soma (i.e. neuron cell) can have three effects: induced direct firing at a neighbouring neuron, plus excitation and inhibition of any other cell in the network via excitatory and inhibitory weights. In the Dense RNN the soma-to-soma interactions are represented by the probability p that any other cell in the network fires when a given cell fires, that represents random saccades of cells which fire in unison. The Dense RNN that has been used so far also assumes that the internal structure of the network is homogenous, i.e. all the cells have the same connectivity. However this particular constraint can be relaxed if needed.

B. Clusters of Identical and Densely Connected Cells
Let us now consider the construction of a cluster that contains n identically connected cells, each of which has firing rate r, and receives external inhibitory and excitatory arrivals of spikes denoted by λ − and λ + , respectively. Since all the cells are identical, the state of each cell is denoted by q. We also assume that the total firing rate of each cell is r. Each cell also receives an inhibitory input from some cell u which does not belong to the network, and whose activation probability is q u . Thus for any cell in our network we will have an inhibitory weight incoming w − u > 0 from the external cell u to this particular cell.
The dense network itself has no excitatory or inhibitory weights, but whenever a cell i fires, it triggers the firing of any other cell j at random with probability p n−1 due to the soma to soma interactions, and creates a cascade of cells that fire until either a non-excited cell is reached so it cannot fire, or a cell in the chain sends an excitatory spike to another cell which does not fire. Since all the cells behave in a statistically identical manner, the probability q that a cell is excited satisfies the equation: which reduces to: which is a second degree polynomial in q. Since q is a probability, only its positive root(s) which are less than one are of interest. When n is large, the expression (2) simplifies to:

C. Sructure of the Dense RNN used in this Work
We define the Dense RNN model by its inputs, outputs and its internal architecture. As shown in Figure 1, the input of the Dense RNN is the collection of the extracted metrics {x i k−1 } i∈{1,...,l} related to the transmission of packet k−1, and its output is the collection of the "normally expected metrics" {x i k } i∈{1,...,l} for the transmission of the following packet k. Note that l denotes the total number of metrics that are collected for the analysis.
Let X denote the input matrix whose entry (k, i) is x i k , and X denote the output matrix whose entry (k, i) isx i k . Moreover, we let O m denote the output vector of layer m and W m denote the connection weight matrix between the layer m and layer m + 1 for m ∈ {0, . . . , L}, where m = 0 is the input layer of the Dense RNN. Each hidden layer m ∈ {1, . . . , L} of the Dense RNN contains l Random Neural Network cell clusters each of whose probability of activation is denoted by ζ(x) which is the positive root obtained from the expression (??) with x = q u .w − u . Accordingly, for the given input matrix X, the forward pass of the Dense RNN is computed as: where ζ(·) is a term-by-term activation function for vectors or matrices.
The connection weights of the Dense RNN are computed with an efficient training procedure which is developed in [17] that combines unsupervised and supervised learning. In order to create the auto-associative memory, we train the Dense RNN by using "only" the data of benign IoT traffic, and in Section III-G we show that the training time of the Dense RNN is low and competitive with the computational time associated with simpler models,

D. The Attack Decision Maker
Finally, the Attack Decision Maker module aims to give the final attack decision for the current data packet based on the actual and the predicted metrics of the packet. To this end, in this module, we calculate the absolute difference between the actual and the predicted value (which is the expected value for the normal traffic) of each metric and apply threshold on the difference as y k = i∈{1,...,I} where Θ is a threshold for the binary decision. Note that, clearly, the small values of Θ cause the false positive alarms while the large values of that cause false negative alarms. In addition, α i is an coefficient for the attack decision with respect to the Metric i, and i∈{1,...,I} α i = 1. Furthermore, Ξ = 1 if Ξ is a true statement and Ξ = 0 otherwise.

III. EXPERIMENTAL RESULTS
In order to evaluate the performance of our attack detection method, we use the Mirai botnet attack data from the publicly available Kitsune dataset [26], [27]. This dataset contains 764, 137 packet transmissions including both normal and attack traffic. We use only 70 % of the normal traffic packets for training and all of the packets (both normal and attack traffic) for the test of the attack detector.

A. Parameters of the AA-Dense RNN
To implement the AA-Dense RNN, we use a two-layer Dense RNN (i.e. L = 2), with n l = I ∀l ∈ {1, . . . , L}, where I = 3, and recall that I is the total number of metrics that are being used. In addition, we set p = 0.05, r = 0.001 and λ + = λ − = 0.1. Moreover, we set K = 500 packets and T = 100secs in the Metric Extraction module.

B. Comparison with other Techniques
We first selected the Simple Threshold method as the simplest benchmark for attack detection. In this method, we basically use (8) by replacing d i k with x i k . In order to achieve the best performance of this method, we search for the best value of Θ on the test set which includes both normal and attack traffic. 1) Least Absolute Shrinkage and Selector Operator (Lasso): We selected the Lasso as the linear model that replaces the AA-Dense RNN in the proposed method in Figure 1. To this end, we create an auto-associative memory based on Lasso by training it on 70 % of the normal traffic. For the implementation of Lasso, we use the scikit-learn library [28] with "alpha = 0.1".
2) K-Nearest Neighbours Regressor (KNN): We use KNN to replace the AA-Dense RNN module in Figure 1. Similar with other methods, we trained KNN on 70 % of the normal traffic. In addition, for the implementation of this model, we use the scikit-learn library with "n neighbours = I".
C. Importance Analysis of the Metric Candidates to Determine the Value of the α i s.
We now aim to analyze how important each feature candidate for the detection of Mirai botnet attacks in the considered dataset. To this end, we first perform the following analysis: 1) Pearson Correlation: For each Metric i, we compute Pearson correlation coefficient [29] between that metric and the attack label. This coefficient measures the strength and the direction of the linear relationship between the considered metric and the attack label. Since we desire to measure only the importance of Metric i for the detection of attack, we need only the strength of the relationship so we let ρ i denote the absolute value of the coefficient for Metric i.
2) ANOVA: We compute the other coefficients as the Fratios [30] that are calculated via the Analysis of Variance (ANOVA) method. The value of the F-ratio corresponding to Metric i measures the statistical significance of that metric for the decision of attack. We let f i denote the normalized F-ratio for Metric i in the range [0, 1].
After we compute the coefficients via each of the Pearson correlation coefficient and ANOVA, we calculate the importance coefficients of the metrics as In Figure 2, we present the value of α i as well as the values of ρ i and f i for each Metric i, i ∈ {1, 2, 3}. In this figure, we see that the importance of each of Metrics 1 and 3 is higher than that of the Metric 2 with respect to each of α i , ρ i and f i . In addition, the values of α 1 , ρ 1 are close to those of α 3 , ρ 3 although there is a significant gap between f 1 and f 3 .

D. Performance Evaluation of AA-Dense RNN with respect to the Selection of Metrics
In Figure 3, we show the performance of the proposed attack detection method with the selection of different metrics, as well as the combination of all metrics, and see that AA-Dense RNN achieves the highest accuracy at 99.84% when we use the weighted combination of Metric 1, Metric 2, and Metric 3, as we do for the performance evaluations in the rest of this paper.
The high detection performance result shows that the AA-Dense RNN is able to classify normal and malicious traffic although it has been trained with only normal traffic. Moreover, our results show that the accuracy of the AA-Dense RNN is more than 95% under the selection of any metric. In addition, we observe a close relationship between the importance coefficients of metrics in Figure 2 and the performance of the AA-Dense RNN detector in Figure 3.  Figure 4 presents the True Positive and True Negative percentages with respect to the increasing value of Θ from 0 to 0.5 with 0.01 increments.
In Figure 4, we see that the AA-Dense RNN detector is highly robust with respect to Θ ∈ [0.01, 0.25]. Thus, in the practical usage of the proposed method, we may select any value of Θ in the range [0.01, 0.25] without a significant performance loss. In addition, in this range the AA-Dense RNN is fair in detecting both attack and normal traffic, and it achieves high performance for both.

F. Comparison of the AA-Dense RNN's Performance with KNN and Lasso
Let us now In this compare the attack detection performance of the AA-Dense RNN with the Simple Thresholding, Lasso, and KNN methods, where both the Lasso and KNN are trained as auto-associative memories.
In Table I, we present the comparison of the detection methods with respect to each of the accuracy and percentages of true positive, false negative, true negative and false positive. The detection methods in this table are placed in descending order with respect to their accuracy. ur results show that AA-Dense RNN attack detection significantly outperforms the other methods with respect to accuracy. In addition, we see that this auto-associative network achieves much higher accuracy than Simple Thresholding. We see that the AA-Dense RNN achieves 99.82% true positive and 99.98% true negative acuracy, higher than the other methods. Among all the methods, the Lasso obtains the true negative percentage closest to the AA-Dense RNN, and significantly higher than KNN and Simple Thresholding.

G. Computation Time
We now compare the AA-Dense RNN with KNN and Lasso with respect to the training and execution times, both being measured on a workstation with 32 Gb RAM and an AMD 3.7 GHz (Ryzen 7 3700X) processor. Figure 5 shows the training time of each of the AA-Dense RNN, KNN, and Lasso models, where the training is performed for 70 % of the normal traffic (83138 samples). While the attack detector may be trained offline in real-life usage, the training time is not a major issue as long as it is acceptable. In the sae figure we see that the training time of the AA-Dense RNN is less than 0.1 sec which is highly acceptable. In addition, the training time of the AA-Dense RNN is significantly less than that of KNN; however, it is higher than that of the Lasso method.  Figure 6 shows the execution time of each of the AA-Dense RNN, KNN, and Lasso models for the classification, evaluated per single traffic packet. We see that the execution time of the AA-Dense RNN detector is around 0.5 µ secs. While that of all other methods is less than 10 µ secs, the KNN's execution time is quite high, and LASSO's execution time is the shortest. This shows that the AA-Dense RNN and LASSO detectors are suitable for use in real-time attack detection.

IV. CONCLUSIONS AND FUTURE WORK
IoT devices are often rapidly installed with known factory parameters, that attract Mirai-type Botnet attacks. Therefore we have introduced an attack detection scheme for the IoT using the Auto-Associative Dense Random Neural Network (AA-Dense RNN) for Mirai-like Botnets.
The approach has the added advantage that it is trained with normal traffic to detect attack traffic, and it compares favorably with two known ML techniques: the Least Absolute Shrinkage and Selector Operator (Lasso) and the K-Nearest Neighbours (KNN), as well as with a simple thresholding technique.
Our experimental results on a publicly available dataset containing 764, 137 packet transmissions, show that the method introduced in this paper achieves 99.84% accuracy with 99.82% true positive and 99.98% true negative rates, which is much better than KNN detection and better than Lasso. Both the AA-Dense RNN and Lasso have training and testing times that are shorter than KNN.
The computation times and accuracies of AA-Dense RNN and Lasso are well within the needs of real-time on-the-fly lightweight attack detection, with AA-Dense RNN being best for accuracy and Lasso being best for computation times.
Future work will further evaluate the performance of the proposed attack detector on other available Botnet datasets, and extend the design to detect different attacks with a single AA-Dense RNN detector that is trained on benign traffic.