Intelligent system for IoT botnet detection using SVM and PSO optimization

: Botnet attacks involving Internet-of-Things (IoT) devices have skyrocketed in recent years due to the proliferation of internet IoT devices that can be readily infiltrated. The botnet is a common threat, exploiting the absence of basic IoT security technologies and can perform several DDoS attacks. Existing IoT botnet detection methods still have issues, such as relying on labeled data, not being validated with newer botnets, and using very complex machine learning algorithms, making the development of new methods to detect compromised IoT devices urgent to reduce the negative implications of these IoT botnets. Due to the vast amount of normal data accessible, anomaly detection algorithms seem to promise for identifying botnet attacks on the Internet of Things (IoT). For anomaly detection, the One-Class Support vector machine is a strong method (ONE-SVM). Many aspects influence the classification outcomes of the ONE-SVM technique, like that of the subset of features utilized for training the ONE-SVM model, hyperparameters of the kernel. An evolutionary IoT botnet detection algorithm is described in this paper. Particle Swarm Optimization technique (PSO) is used to tune the hyperparameters of the ONE-SVM to detect IoT botnet assaults launched from hacked IoT devices. A new version of a real benchmark dataset is used to evaluate the proposed method's performance using traditional anomaly detection evaluation measures. This technique exceeds all existing algorithms in terms of false positive, true positive and rates, and G-mean for all IoT device categories, according to testing results. It also achieves the shortest detection time despite lowering the number of picked features by a significant amount.


Introduction
The Internet of Things (IoT) is a network of intelligent gadgets. Several applications, like traffic monitoring services and healthcare, use it in our daily lives [1]. Connecting a large number of devices to the Internet and allowing them to actively engage in the network, IoT allows machine-to-machine communication over the Internet [2]. In 2017, there were roughly 27 billion IoT devices, and by 2020, this number is anticipated to reach 50 billion [3]. There are a lot of security issues with IoT settings since they are generally made up of a lot of heterogeneous and low-cost devices that have little or no security built-in [4]. Growth in IoT devices and the lack of security have attracted unscrupulous individuals to launch a series of DDoS assaults through the deployment of massive IoT botnets [1]. One or more malicious people operate a botnet, which is a group of hacked devices used to carry out harmful actions. Among the most 70 of the model's ability to capture typical behaviour. This means that both known and undiscovered botnet attacks can be detected by the algorithm provided here. Because IoT botnets are constantly evolving [7], most anomaly detection techniques are ineffective [15].
-Since the suggested model is located on the gateway, it maintains the energy, compute, and memory resources on the restricted IoT devices. In this way, the functioning and lifespan of these IoT devices with limited capabilities will be preserved.
A new model based on PSO and ONE-SVM algorithms is proposed as the main contribution of this paper. Models such as the PSO and its unique hierarchical structure have been incorporated into the suggested model. That's the first time we've tried to use PSO to optimize ONE-SVM in order to detect IoT botnet attacks. This technique is also evaluated on real IoT botnets datasets, although most studies on IoT botnet assaults use simulated data.
Here's how the paper is structured, A review of prior work in the domain of anomaly detection approaches for IoT botnet assaults based on ONE_SVM is provided in Section 2. Section 3 summarises the current understanding of ONE-SVM and PSO algorithms. The suggested approach for detecting IoT botnet assaults is described in depth, along with the assessment measures that will be used to evaluate the algorithm. Sect. 4 describes the experiments and their outcomes. Sect. 5 summarises the results and discussion of the results of this research.

Related works
We shouldn't assume that all IoT device manufacturers will install the necessary security apps on their devices. IoT devices such as wearables have restricted access. Host-based anomaly detectors in IoT devices with limited computing, energy and memory resources will be used. Since IoT devices have a limited capability, these algorithms need to be lightweight to preserve the IoT devices' functioning. For conventional networks and IoT networks, [15] present a hierarchical classification of botnet detection algorithms.
According to [16], a hybrid of an artificial fish swarm algorithm with SVM can detect botnets. These studies sought to discover the most important botnet properties by evaluating data with optimized algorithms.
It was found that using darknet Telnet scans as information, [17] created a honeypot that emulated specific Telnet services. Using this honeypot, they were able to conduct a thorough analysis of ongoing attacks. There is no consensus on how to utilise honeypots to identify attacks on internet service providers in this study. Recently published research on network-based IoT botnet detection sparked this paper's development [18] [19]. And they created an actual IoT botnet dataset that was utilised as the basis for empirical testing. Instead of using a deep autoencoder, this work used far simpler algorithms widely used for anomaly detection to get greater G-mean, FPR and TPR over a fresh version of the N-BaIoT dataset and maybe improved performance.
As a result of Wolpert and Macready's No Free Lunch Theorem (NLF) [20], no optimization algorithm is effective than others for all optimization problems. Certain sorts of problems can be solved more efficiently with new algorithms. Swarm intelligence algorithms such as PSO have the potential to surpass traditional optimization techniques. As a result of its unique population structure not being present in other metaheuristics, this algorithm excels above others [13] [21]. To overcome the IoT botnet detection problem, it would be advisable to take advantage of PSO's special operators.
The botnet detection approach developed by [22] is based on unsupervised evolutionary Internet of Things (IoT). To identify between IoT botnet assaults that are initiated by compromised IoT devices, their proposed strategy had a primary objective. To do so, they used the Grey Wolf DOI: 10

71
Optimization method, which is an efficient swarm intelligence system (GWO). Their baseline One-Class Support Vector Machine was optimized using GWO (OCSVM). As a result of this, their model tended to uncover attributes that best explain the IoT botnet problem.
In conclusion, the suggested approach differs from earlier investigations because the proposed algorithm learns from normal data by constructing a model for each IoT device type, and then uses these models to identify both known and undiscovered IoT botnet assaults instantly.

Preliminaries
So that the suggested method can be better understood, this part briefly introduces the ONE-SVM in Sect. 3.1 and the PSO algorithm in Sect. 3.2.

ONE-SVM algorithm
Based on two-class SVM, Schölkopf suggested ONE-SVM [9]. A single class classification problem can be solved by using ONE-SVM, which separates target instances from outliers. Assume that xL for L = 1,...,n are the training examples in X, and that φ(xL) is the nonlinear transformation function that translates an instance xL from X to F. According to the concept of One-SVM, the training instances in F are separated with maximum margin from the origin in order to generate a F hyperplane corresponding to the kernel. The hyperplane is assumed to be defined by ∏ ⋆ ∶ ⟨w ⋆ , φ(x)⟩ − ρ ⋆ = 0 i, where w is the hyperplane's normal vector, and is the threshold. As demonstrated in Eq. (1), one can compute both w and p by solving the following optimization problem [9]. The hyperplane with the greatest margin can then be found.
where ⟨w, (xi )⟩ ≥ − i , i ≥ 0, where i are defined as the slack variables that embedded to an inequality constraint to map it to an equality that may embedded in some cases. Such that the kernel function is defined by k(xi , xj ) with a value is ⟨ (xi , xj )⟩. Then after obtaining the optimal solution , the threshold parameter can be defined by adapting with ⟨w, (xi )⟩, where w equals to ∑ # ( ) and xi is somewhere instances with i ∈ (0, ). Not only that but also, the points with > 0 are defined as support vectors. The ONE-SVM function f(x) is determined as shown in Eq.
This instance is classified as positive with decision function is greater than or equal to zero; otherwise, it is classified as negative.

72
Based on the social behaviour model, [23] created particle swarm optimization (PSO). As a result of space constraints, we will only briefly introduce PSO in this article. It varies from typical optimization approaches in that a population of possible solutions is used in the search process instead of just one. The search is guided by the direct fitness information, rather than function derivatives or related knowledge. PSO has the potential to solve the 2-D maximum entropy problem. A set of random particles (solutions) is used to initialise PSO, which then seeks optimal solutions by updating successive generations of the algorithm. A particle's goal function values can be improved by using what other particles have discovered or experienced in their exploration and hunt for higher values.
In the swarm, let me be the index of particles. They move at speeds of vi, which are constantly changed according to their individual prior best solution si as well as the previous best solution s of the entire swarm, through an n-dimensional search space R n A linear combination of location and velocity vectors is used to determine the updates to the velocity vectors. According to the following equations, the particles interact and move in different directions: ,

(4)
This corresponds to random values between zero and one (r t 1 and r t 2). There are two learning components c1 and c2. And w is the inertia weight of the system. By providing upper and lower constraints on vi, it is possible to prevent particles from moving too fast in the search space. To discover the optimum, we can utilize a conventional approach to find it. When the maximum number of iterations is reached or the minimum error condition is met, the search operation stops. According to industry standards, the standard procedure is as follows.
1. To do this, you must set the iteration number t to 0. Initiate randomly the swarm S of m particles (population number) so that each particle's position p0 meets the given parameters. 2. F(pi), the object function, should be used to determine the fitness of each particle. 3. Then, compare each particle's personal best to its present fitness, and set si to the greater performance, i.e. , 4. Put the global best 0 ( ) to the particle in the swarm with the best fitness inside the swarm, e.g.
5. According to Eq (5), change the particle's velocity vector (9). Each particle must be moved in accordance with Eq (5). 6. Assume that t Equals t plus 1. 7. Then, go back to step 2 and repeat until the stop condition has been reached. When using PSO to solve optimization issues, there are two crucial steps: the representation of the solution and the fitness function. PSO's ability to use real numbers as particles is one of its best features. For example, it's not a genetic algorithm like [24] , which requires binary encoding transformation and particular genetic operators to work. As well as the way to accomplish picture segmentation, the comprehensive application of PSO will be presented in the next part. 73 .

The proposed IoT botnet detection method
There are two key steps to our proposed anomaly detection approach, as shown in Figure 1. (1) preprocessing of data and (2) proposed method are shown. The botnet problem is explained in the first section of this document. Next, the phases of the suggested method are described, which include normalisation, cleaning, and integration of data, among others. As a next step, ONE-SVM hyperparameters are found using PSO, and then feature selection is performed at each specific device in the optimization set. All phases of the suggested methodology are described at length in the following paragraphs.

Problem definition
Because it automatically disconnects compromised IoT devices from the network to prevent the spread of these attacks, instantaneous botnet detection can help improve IoT security. IoT botnet detection 74 approaches have traditionally relied on supervised learning, which has several drawbacks. A huge amount of storage space is needed to store a large number of examples of each type of malicious traffic, along with an equal number of benign traffic. As a second step, an expert is required to determine whether a packet is malicious or not. As a result, this will take a lot of time. A high number of IoT botnet attacks are conducted every day, making this technology ineffective at detecting them. There is an increasing need for unsupervised and faster approaches to detect IoT botnets as a result of these three issues. The goal of this work is to design an evolutionary network-based approach to tackle these difficult detecting challenges.

Data Pre-processing
The overall quality of the final model and/or the time required for actual training may be affected if raw datasets are used without any preprocessing. This is why it is possible to apply a variety of prepossessing tactics [25]. Since the N-BaIoT dataset is fresh, large, and based on real-world data, it may have inconsistency and redundancy. Data normalization, Data cleansing, data integration, and data reduction are some of the preprocessing procedures used to make the most of it. Each of these strategies will be described in more detail in the subsections that follow them.

Data cleaning
As a pretreatment approach, data cleaning is critical in ensuring that a dataset appropriately reflects the problem being tackled. It seeks to reduce inconsistency from the dataset because inconsistency can influence the accuracy of the generated models. It is possible, as a result, to improve the overall quality of the final model. Using the N-BaIoT dataset, the number of packets from source MAC IP, source IP, channels and sockets are dispersed. It also has five other elements that represent the dispersion of the time between packet arrivals, as well. To calculate 15 of these features, standard deviation is used. Standard deviation is the square root of variance, hence it is possible to unify the way features are calculated by utilising standard deviation or variance as a unit of measurement. In this study, the variance is employed as a measure of statistical significance.

Data integration
In general, IoT devices, according to [18], are task-oriented devices. Integration of data for each device type is therefore a simple process. For each device type, this integration translates the requested functionality into a single typical traffic pattern. This can be improved both in accuracy for each device type and in speed. In addition, the number of created models is reduced as a result of the optimization. An updated N-BaIoT dataset is shown in Table 1, which was created by integrating data according to device type and the amount of packets consumed during training, optimization and testing. The new N-BaIoT dataset called NN-BaIoT was developed to test the proposed approach.
When normalizing a dataset, Aoi represents the i th instance, while Ai represents the i th instance after normalization. Also, n represents how many instances are in the supplied dataset, and i represents an integer number from 1 to n, respectively.

Data reduction
A real-world dataset may contain aspects that are redundant and irrelevant. By reducing the number of superfluous features, data reduction strategies can increase efficiency and classification accuracy. Feature selection is one of the most important data reduction methods. A predetermined evaluation metric is used to find the optimal subset of features. As a result of PSO algorithm, the feature subset for ONE-SVM has been optimized for ONE-SVM. Following is a description of this technique.

Proposed Method
This section explains how to identify IoT botnet assaults using PSO and ONE-SVM. Where PSO is used to optimize the hyper-parameters of ONE-SVM and to perform feature selection at the same time. Wrapper feature selection strategy is used instead of filter approach because it is more significant than filter approach [26]. There are three primary components to a classification task: a learning algorithm, a search algorithm, and an evaluation measurement [27]. In order to evaluate the selected subset of features, the wrapper feature selection strategy incorporates the entire learning process. ONE-SVM is employed as a learning algorithm and FPR(1-TPR) is used as an assessment metric in the design of the proposed approach. IoT botnet problem was solved using PSO and ONE-SVM algorithm. As a result, the following two critical points should be addressed: -Individual representation: in order to address this issue, it is necessary to determine the decision factors that reflect individuals. They include ONE-SVM hyperparameters and all features from the NN-BaIoT dataset. An individual is therefore expressed as a 1-dimensional array of real numbers comprising 117 items, according to Eq (8). The last two components of the array are the potential hyperparameters of ONE-SVM v and the kernel parameter . In other array elements, boolean variables are represented, with each variable representing a feature. So, if the boolean variable's value is greater than or equal to 0.55, the feature is picked, else it is ignored.

76
-It is necessary to evaluate an individual's quality based on a set measurement to address this point. Accordingly, the fitness function should be chosen based on the problem at hand, and vice versa. -IoT botnet's fitness function is estimated using an optimized one-SVM model to minimize FPR and 1-TPR using the proposed method. In this case, the ONE-SVM model is trained by using the ONE-SVM parameters and the features denoted by the person. Individual I at iteration t will be evaluated using the fitness function described in Eq. (9) to determine the quality of the individual. So that overfitting can be avoided, the fitness function is evaluated using the NN-BaIoT dataset optimization set. This results in more resilient models.

Experiments and results
Through trials on the NN-BaIoT dataset, the suggested PSO-ONE-SVM approach is evaluated. The performance of the suggested approach is compared to other algorithms typically used for anomaly detection, such as OCSVM, GWO-OCSVM and IF for verification [28]. These sections describe and analyze the experiments and their outcomes in depth.

Evaluation measures
For the proposed IoT botnet detection approach, the performance is evaluated using the following three evaluation measures: (1) True Positivity (TPR). In other words, it is the ratio of true positives to the total number of true positives and false negatives: = +

77
The testing set of the dataset is used to determine the TPR, FPR, and G-mean. Greater FPR, G-mean, and TPR values suggest stronger anomaly detection models, with the perfect model occurring when FPR, Gmean, and TPR are equal to 1.

Experiments environment and setup
To build this approach and other anomaly detection algorithms, Anaconda Python framework version 5.3 is employed. EvoloPy [13] uses the open-source code of PSO to implement the swarm optimization method. EvoloPy is a Python-based open-source optimization framework. Metaheuristic algorithms are included in the framework. As demonstrated by the designers of EvoloPy, the Python implementation of PSO has a faster running time than its Matlab counterpart for large-scale problems. In all trials, a Windows Server 2012 64-bit OS and an Intel(R) Xeon(R) CPU E5 2609 with 64 GB RAM were used as the operating system platforms. Of the benign, 1/3 of the data from each of the five IoT device types were used as a training, optimization, and testing approach (i.e., the training set of each device type). These experiments were carried out to better understand the normal patterns of traffic on the Internet. PSO-ONE-SVM hyperparameters were optimised and the optimal features subset was determined by training and testing each IoT device type with 1/6 benign data and 1/3 malicious. A device's test data is made up of 1/3 benign data and 2/3 harmful data. Using grid search for hyperparameter tuning, the other three algorithms may be compared fairly. When using PSO-ONE-SVM, the number of iterations is set to 20. The PSO-ONE-SVM method converges to this figure after numerous initial experiments. Currently, the number of participants is set at 26. Search agents are limited to 7. According to well-regarded publications, PSO's additional parameters are set. Due to the normalization of the dataset, the lower bound lb and upper bound ub are both set to 0. For example, OCSVM, GWO-OCSVM and IF were all employed to detect anomalies in this paper. Table 2 lists the optimized hyperparameters for these algorithms. Using PSO, PSO-hyperparameters ONE-SVM's are optimized, whereas grid search is used to optimize the hyperparameters of other methods.  connection with C&C server, and conducting various types of malicious actions [5]. It is insufficient to merely determine the early phases of infection according to [18]. For example, in the N-BaIoT data set, the last stage of botnet formation is when the IoT bots start conducting assaults. The use of this technology, therefore, adds a last line of defence to the proposed strategy. This is the first true dataset that has been used in the literature to detect IoT botnet assaults, as far as we are aware. Large organisation networks, which are likely to see an increase in the number of Devices, are used to collect data for the N-BaIoT datasets. As well as normal and attack situations, this collection contains statistics on the network traffic of an IoT system. On the IoT network, we have installed a baby monitor and two doorbells as well as four security cameras and a thermostat. Wi-Fi is commonly used to connect these devices to IoT networks. We collected data on every single device both in its default operating mode and when attacked by the BASHLITE and Mirai botnet. It is possible to forget information over time if you have 23 recorded features in Table 1 and 5 values of the factor =(0.01, 0.10, 1, 3, 5) utilised in-stream clustering This means that there are 24.2, multiplied by five, or 115 characteristics.

Results and discussion
It is shown in Table 3 how the proposed method and other anomaly detection algorithms were performed in testing on the NN-BaIoT dataset. Note that greater TPR values (detection of attacks after they occur) and lower FPR levels (misclassification of benign data as malicious) equate to better results. Figure 2, 3 and 4 are charts that illustrates the relative performance of the proposed algorithm in comparison to previous anomaly detection algorithms. As you can see in this chart the suggested algorithm's TPR and FPR values are significantly higher than those of the other anomaly algorithms, indicating that the hyperparameters and feature subset identified by PSO-ONE-SVM are likely to be the most effective. OCSVM, on the other hand, has a lower FPR than the IF, but a lower TPR than OCSVM. IoT devices were found to have a highly variable TPR and a very variable FPR using OCSVM and LOF, respectively. Thermostat, webcam, doorbell, and security camera are examples of IoT devices with limited and deterministic capabilities according to [18]. Other IoT devices, such as a baby monitor, have a wide range of functions. Note that the suggested approach outperformed OCSVM in terms of TPR and FPR, while the deep autoencoder was unable to capture the regular traffic pattern for baby monitor devices, as described by [18]. Because IoT devices tend to be task-oriented, their restricted capability can be translated into a limited number of conventional traffic patterns. Because the NN-BaIoT dataset is unbalanced, the G-mean is also  employed for the imbalance. Table 4 shows the classification results for the NN-BaIoT dataset in terms of the G-mean metric at the testing stage Because its average among IoT devices is 0.988, whilst the other algorithms are all below 0.7, the suggested algorithm surpasses the others. A suitable balance between TPR and FPR values can only be achieved by using the algorithm that has been provided here. According to Table 5, existing anomaly detection techniques are compared with the proposed approach in terms of their average detection time over the NN-BaIoT dataset at the testing stage. The suggested approach is superior to the previous three anomaly detection algorithms, requiring an average of only 5 seconds to detect the attacks. The average DDoS assault lasts between 30 and 100 seconds, with 10% of attacks lasting more than a day and 3% lasting more than a month [30]. Considering that hacked IoT devices are immediately disconnected from the network once the attacks are started, these attacks can be stopped in less than five seconds. This is a significant reduction in DT.

Conclusions and future works
This is due to the fact that the number of deployed IoT devices has exploded, they lack security, and the hacked IoT devices may not show any obvious indicators of infection. By utilising GWO to improve the hyperparameters of OCSVM and performing feature selection, the proposed 82 algorithm in this research aims to identify IoT botnet assaults by utilising GWO. The OCSVM classifier's performance was significantly improved by using the GWO on NN-BaIoT data. According to G-mean for all types of IoT devices, the proposed method surpasses the three previous unsupervised algorithms typically used for anomaly detection In addition, it reduces the number of selected features while achieving the lowest detection time. In the future, the suggested technique will be tested for different types of IoT devices using Big Data training data.