Federated Intrusion Detection In NG-IoT Healthcare Systems: An Adversarial Approach

In recent years and with the advancement of IoT networks, malicious intrusions aiming at disrupting the services and getting access to confidential information in medical environments is ever progressing. To that end, this paper proposes a Federated Layered Architecture to be used in Medical Cyber-Physical Systems (MCPS) Networks that entails the creation of multiple aggregation layers to induce further security to the model training process. Moreover, two Deep Adversarial Neural Networks (GANs) are presented for use with data found in the MCPS environment. The evaluation of the presented work showed that the models trained in the Federated system have an increase in their ability to detect possible intrusions in the MCPS network than the commonly trained models.


I. INTRODUCTION
I n the modern age and the era of global network interconnection, most contemporary digital systems are characterized by a plethora of Internet of Things (IoT) devices connected with each other and communicating various information in order to provide a level of quality of service to the respective environment [1], and in extent in the Next-Generation IoT (NG-IoT) networks comprised of human-centric approaches, AI enabled devices, fog networks and other. MCPS are no exception to the aforementioned schemes, as they communicate critical information in a seamless interconnected network of priority devices. Specifically, MCPSs describe networks comprised of medical and healthcare related devices in charge of medical-oriented tasks or information dissemination of health/patient related context in medical environments such as hospitals, infirmaries, etc. They help monitor patient condition, oversee medical data analysis and archiving, even patient treatment through specialized equipment [2] [3], such as ICU monitors, medical databases, laboratory equipment, healthcare actuators and other. Important as they are, though, MCPS systems face certain problems like component heterogeneity, inoparability, high assurance and other serious setbacks interjecting with their crucial constitution [4] [5]. Among these problems is the security and privacy of the MCPS networks as a part of IoT/NG-IoT networks [6]. The amount of diverse information, the increased use of wireless networks for information sharing and the lack of proper security infrastructures leave MCPS systems vulnerable to suspicious activity and , due to the critical nature of the network, possibly catastrophic consequences.
To tackle possible threats and secure the network's subsequent privacy, a variety of security systems have been developed on top of common practices. Intrusion Detection and Prevention Systems (IDPS), leveraging various techniques like traffic monitoring, Machine Learning (ML) and Deep Learning (DL), signature pattern identification, have produced numerous encouraging results towards ensuring the fortification and further integrity of their corresponding environment. A problem arises, though, because of the sensitive nature of the data, being patient records or laboratory tests and results, their volume since every device is constantly producing data over the network and heterogeneity due to the different nature of each device network. Federated Learning (FL), a new DL methodology for decentralized model training copes with these setbacks by training DL models, locally, onto the different edge IoT devices [7] without the need to communicate those data to a central system, only sending model weight updates, thus avoiding possible network overhead and keeping the data private.
In this paper we introduce a Federated Learning architecture for Intrusion Detection (ID) in MCPS networks by utilizing the Generative Adversarial Network (GAN) DL scheme. GAN networks have become powerful tools used in ID due to their adversarial training that pushes them to create strong DL models. Specifically, a multi-layer federated heap is produced, utilising different data description from the MCPS network environment in order to successfully identify anomalies that may indicate an ongoing cyber-attack to the medical network.
In this system, GAN models are trained on two distinct data categories, namely, a) Patient Medical Data and b) Network Traffic Data, using publicly available datasets, to simulate an MCPS data ecosystem. Consequently, quantitative metrics are used to measure the performance of both the GAN models and the FL architecture. The contributions of this paper can be summarized as follows: • Producing a Federated Layered Architecture to be used in MCPS Networks: The FL scheme was devised with respect to the medical environment as a Critical Infrastructure (CI) and to its need for continuous operation. • Delivering an Online Federated Adversarial Training Method for core GAN Training logic: The produced system offers a generalized method for GAN training in order to accommodate a variety of GAN designs. • Presenting a GAN network for two Intrusion Detection problems: Two GAN models are produced and tested based on two morphologically different Intrusion Detection Data approaches. The rest of this paper is organised as follows. Section II presents relevant work. Section III provides a basic background to the tools and methods used in this work. Section IV analyses the Federated architecture and the produced DL models, while Section V presents the evaluation of the introduced system. Finally, Section VI concludes this work.

II. RELATED WORK
In the field of MCPS Intrusion Detection some worth mentioning work has been previously performed. In [2], the authors produce a distributed intrusion detection system for MCPS networks. Specifically, the work presents a Federated Learning IDS (FLIDS), based on the architecture developed in [8], adapted to the MCPS usecase by utilizing distributed Multilayered Perceptions trained on the MIMIC public dataset containing Intensive Care Unit (ICU) data. The designed system architecture supports distributed IDSs on mobile devices controlling Body Area Networks (BAN) communication [9] with a hospital back-end server. The work targets only the i)Denial of Service (DoS), ii)Data Modification, iii)Data Injection and iv)Eavesdropping attack attack categories usually seen on healthcare networks. The FL system groups data based on identifying factors in order to generalize the produced global models and reduce computation and communication costs. The evaluation of the proposed system shows a significant increase in communication efficiency while keeping high accuracy score and low false positive rate (FPR) on the FL anomaly detection.
In [10], the authors explore the vulnerabilities of personal healthcare systems while applying a series of available exploitations to those devices. Subsequently, they present the HEKA IDS system that is trained to recognized medical data traffic, extracted with an n-gram based approach using the sequential patterns produced by the Personal Medical Devices (PMDs), in the network using different Machine Learning techniques. The proposed system, tested against a series of available cyber attacks, shows that HEKA achieves high accuracy scores over different configurations.
In [11] a novel method for distributed Generative Adversarial Deep Neural Networks is proposed. In particular, the authors present MD-GAN, that adapts the GAN system in a Federated fashion and test the produced system over three major DL public datasets, namely, a) MNIST, b) CIFAR10 and c) CelebA, while addressing distributed scalability issues. The results show that the produced implementation achieves high score among the datasets by adapting to the Federated procedure.

III. BACKGROUND
In this section, the necessary background for the tools and methods used in this work are presented.

A. GAN Architecture
The GAN architecture, is based on a pair of neural subnetworks, a) the Generator G and b) the Discriminator D, that play and adversarial game with each other [12], [13]. G usually takes an input random noise data and tries to produce similarto-the-real data of the given usecase. On the other hand, D is trained to distinguish between the fake data produced by G and the real samples. The GAN architecture aims in training both sub-networks, that rival each other, so that the Generator can produce realistic samples that the Discriminator can't differentiate from the real input data and vice versa. Equation 1 below shows the relation of G and D.
where G accumulates noise z from space Z mapping it to the space X from which D inputs x. p data (x) and p z (z) denote the probabilistic distribution of Spaces X and Z respectively.
For the Anomaly Detection procedure an intermediate model is exported after the training from the Discriminator module. This model is the part of the Discriminator from the input up to a latent layer before the output sequence of the network. This intermediate model is used to reduce the input dimension of the input samples into a specified latent space. To calculate the anomaly score of each sample, the Adversarial Loss (L adv ) of the two samples is computed. The L adv Loss is the normal distance between the generated/fake and the real sample. Since the Generator has learned to produce normal samples, the greater the Adversarial loss the bigger the probability of the input sample being abnormal. Equation 2 describes the Adversarial Loss as where L adv is the Adversarial loss score of the function, d r and d p is the prediction of the latent model on the real and generated sample respectively.

B. Federated Learning
Federated Learning is a stochastic distributed learning and privacy-preserving method that undertakes the distribution, orchestration, learning and aggregation of Deep Learning model across a big corpus of devices or edge nodes in the cloud [14] [15]. It works by stochastically disbursing a central Deep Learning model over a given corpus of devices in order to train locally on the on-device collected data. Consequently, the models are send back to the centralized system to be aggregated in a process called Federated Averaging [7], which is defined as the aggregation of the edge-computed weights with the central model.
Specifically, the central server distributes a global model w 0 Global along with training instructions to a Federated population P f ∈ [1, N ] where N ∈ N * , each holding a set of local data D i∈N and local models w i l . The distributed models are subsequently trained on the local data D i and then the weights w i Global are send back to the central system to be aggregated through the Federated Averaging (3), or similar, process in order to produce an updated global model w k Global [16].
Where w k G is the global model at the k th training iteration and w k i denotes the Federated population i th model at that iteration.

IV. DESIGN & IMPLEMENTATION
In this section the architecture design is layed out and the methodology implemented is explained. In particular, the devised Federated MCPS Environment and its interactions are described. Moreover, the MCPS Federated GAN architecture is presented in respect to the usecases used in this work.

A. Federated MCPS Network
As mentioned, the MCPS ecosystem comprises of critical equipment and data transmission pathways that play an important role in the stability of the underling goals of the healthcare system and in extent the preservation of human life. As such, this environment has small to zero tolerance for the obstruction of the operation of those critical elements. With that in mind the proposed architecture supports abstraction of the Federated aggregation process into single or multiple aggregating subsystems that can be decoupled from the internal MCPS network while at the same time preserving the privacy of the data. Specifically, it composed of a number of sub-layers that support both vertical and horizontal expansion. The base layer of the systems, as can be seen in Figure 1, consists of the medical equipment of the different networks in a healthcare system, which produce data. The data acquired from those devices are aggregated in K gateway device responsible for monitoring the said equipment, where K is considered the total of the smallest medical device network in an MCPS environment. These edge IoT devices host the local IDS system correlated with its connected devices. Over the Data Aggregation Layer (DAL) exist L ∈ [1, K/2] Federated Aggregation Layers (FAL). The design leaves room for extra Federated Aggregation Layers in the case of a Wide Area MCPS ecosystem in order to optimize the information abstraction of homogeneous DL models and thus minimize any possible data interception in any level of the architecture. Every DAL hosts N number of Federated Aggregators, where N ∈ [1, P f ], and depending on the layers' configuration. Finally, the last DAL is connected to the Global MCPS Model server that keeps track of the MCPS Federated populations, global shared models and model versions and to which the aggregated models are communicated to.
To understand the averaging of the Federated models throughout the Federated procedure we have the following relation (4) where w L Group denotes the averaged model weights of a specific group in a layer over the k th iteration which is defined by the average of the weights of the Aggregators of that certain population of size N A at that iteration. Generalizing the notation, to find the averaged model of a layer L we use (5) where f k w (l, group) returns the averaged weight of the l th FAL, l ∈ [0, L], of a certain Federated group. w j denotes the j th Aggregator's weights of the group in a layer. The process begins by the FAL receiving a request for a model update on a specific Federated population. The FAL stochastically decides on a subset of data gateways of the mentioned group. Once it decides on a number of devices it sends a training signal to those gateways. Since the GAN architecture consists of two sub-networks it initializes the weights of the said networks, or in the case of multiple FALs it forwards the global model it received from the global server, and sends them to the gateways. For every iteration of the Federated training and since this is an online training, it sends the networks to the nodes and at the end of the iteration it receives them back so to aggregate them with the rest of the received models. Algorithm 1 shows the in-depth procedure of the GAN model training.

Algorithm 1: Federated GAN Training
Data: F group -Federated population of certain group w 0 G G -Global G model before the Federated Training w 0 G D -Global D model before the Federated Training numOf Iterations -Number of Federated Iterations Result: w k G G ,w k G D -Global models after k iterations F clients = SelectSubsetOfFedClients(F group ); S = F group .size(); It is worthy to mention that a security problem arises from the nature of the GAN architecture itself. Since the Generator of the GAN system is trained to produce realistic samples that mimic the data it trained on, any possible interception of the G module could potentially reveal confidential data to the malicious party that intercepted it. This particular problem can be overcome by enforcing a Differential privacy scheme and further cryptographic approaches on the Federated communication model [17]- [19], though this is out of the scope, and will be looked upon on further extension of this work.

C. Federated GAN Model For Medical Data
One of the goals of this work is to produce a GAN Deep Neural Network model to be used for intrusion detection from healthcare data. To that end, a compact GAN architecture was produced in order to accommodate the communicational needs of the Federated environment and the ID demands. The network is trained on normal medical data from homogeneous sources and tested on data with added abnormalities. The network is defined as follows.
The Generator G takes in a noise vector of size 10 and is trained to produce realistic samples of size N Di , close to the real data, where D i is the length of the feature vector.
The Generator's structure consists of eight layers, an input Dense layer, an output T anh layer (6) and a sequence of Dense, ReLU , LeakyReLU , Batch 1DN ormalization and Dropout. On the other hand, the Discriminator D takes in a sample vector of size N Di . D is trained discriminate the input samples as real or fake. The Discriminator's structure consists of eight layers, an input Dense layer, an output Sigmoid layer (7) and a sequence of Dense, ReLU , LeakyReLU , Batch 1DN ormalization and Dropout. The overall structure of the network and its sub-networks can be seen in figure 2.
where equation (6) describes the T anh function. tanh(x) is the output of the tanh function, s(x) is a Sigmoid function (7) and x is the input vector.
Both sub-networks are compiled with the Binary Cross-Entropy function (8) and the Adam [20] optimizer with a learning rate parameter of 0.01.
The Binary Cross-Entropy function is defined as follows. N is the number of samples given, y is the label. p(y i ) is the probability of the sample being a match to the label sample when 1−p(y i ) presents the inverse of that probability. Finally, H represents the result of the Binary Cross-Entropy loss in a given point. Though the size of the presented GAN is aimed for a modest amount of features that was decided on based on the data distribution of common MCPS equipment, it can be scaled using relation (9) which was empirically found to work in the presented scheme of the GAN structure for one-dimensional data input.
where h size is the size of the hidden layer, M denotes the number of features in the dataset, and l i is the current layer.

D. Federated GAN Model For Medical Network Traffic Data
Except the Data produced on healthcare devices, i.e. patient monitoring data, reports and other, Intrusion Detection can be achieved on the network flow level. Utilizing the network flows describing the communication over the MCPS network, over the different protocols used by medical equipment, a significant portion of incoming intrusions or malicious third party meddling can be detected. Consequently, this work presents a novel compact GAN architecture for network flow Anomaly detection. The produced models are trained on normal network traffic data, which is by its nature homogeneous, and tested on data containing a variety of different network exploitations. The GAN sub-networks are defined as follows.
G takes in a noise vector of size z = 10, where z is the length of the noise vector, and is trained to produce realistic samples of size M number of flow features, imitating the real data. The Generator's structure consists of ten layers, an input Dense layer, an output T anh layer (6) and a sequence of Dense, ReLU , LeakyReLU , Batch 1DN ormalization and Dropout. On the other hand, D takes in a sample vector of size N Di . D is trained discriminate the input samples as real or fake. The Discriminator's structure consists of eight layers, an input Dense layer, an output Sigmoid layer (7) and a sequence of Dense, ReLU , LeakyReLU , Batch 1DN ormalization and Dropout. The overall structure of the network and its sub-networks can be seen in figure 3.
Both sub-networks are compiled with the Binary Cross-Entropy function (8) and the Adam optimizer with a learning rate parameter of 0.001.

V. EVALUATION
In order to evaluate the produced architecture and Deep Learning model, the presented system was tested and measured using a series of quantitative metrics.

A. Evaluation Environment
To test the proposed Federated scheme an experimental testbed was devised. Due to the fact that MCPS environments, as Critical Infrastructures, are characterized by the need for continuous and unperturbed operation but also by their confidential nature, a virtual testbed was built. This installation consists of an Ubuntu 20.04 system having the role of the global Control and Command (C&C) server, a subsystem within this server serving as the lower FAL layer and 6 virtual remote Data Aggregators, three for each usecase, namely, a) Medical Data and b) Network Traffic Data, providing the respective data. To each remote worker was distributed a subset The training process began with a new untrained model send to the remote workers form training, aiming to evaluate the full potential of the Federated procedure.

B. Medical Data Anomaly Detection
For the Medical Data Anomaly Detection usecase the CHARIS [21] public clinical dataset was used. This dataset contains multi-channel recordings of ECG, arterial blood pressure (ABP), and intracranial pressure (ICP) of patients diagnosed with traumatic brain injury (TBI). To simulate malicious meddling with the data, a portion was selected as testing data and half of those were enriched with a Gaussian distribution of noise with a factor of 2 based on the deviation of the same data. The alternated samples differ by a fraction from the real data while introducing the effects of the i) Data Modification and ii) Data Scrambling cyber attacks. The data were fed into the Federated GAN network as time-series with a window size of 10 records and normalized in the range of (0, 1) using a Min-Max Scaler.
After a number of experiments with the Federated Medical Data Anomaly Detection part of this work it was observed that the model converged over a small number of Federated iterations. The results show that the Federated procedure produces better metrics that the non-Federated model, as can be seen in Table I.

C. Network Traffic Data Anomaly Detection
For the Network Traffic Data Anomaly Detection usecase the UNSW-NB 15 [22], [23] public Intrusion Detection dataset was utilized. This dataset contains network flows containing normal and abnormal traffic from a test bed environment. The abnormal samples contain a series of cyber attacks to be used for the detection test of the algorithm, namely, i) Fuzzers, ii) Analysis, iii) Backdoors, iv) DoS, Exploits, v) Generic, vi) Reconnaissance, vii) Shellcode and viii)Worms. For the anomaly detection part the attack classes were converted to binary ground-truth labels depicting Normal and Abnoral traffic. The data were fed into the Federated GAN networks single records and normalized in the range of (0, 1) using a Min-Max Scaler.
After a number of experiments with the network Traffic Data Anomaly Detection part of this work it was observed that, even though the results of the network were not very high at the Anomaly Detection task, the Federated model showed an improvement against the non-Federated , as can be seen in Table I.

VI. CONCLUSION
While the advancement in Intrusion Detection and Prevention system is evolving continuously in most modern networked systems, some cyber-physical systems, like most healthcare networks, that were supposed to be isolated are left unprotected due to the restricted work done their involved field. Due to their critical nature, security plays a major role to the fortification of the integrity and continuous operation of those systems. To that end, this paper proposes a distributed layered IDS scheme, enriched with the power of a novel Deep Learning training method, Federated Learning. The presented system leverages the Adversarial architecture of GAN Deep Neural networks, tested on two distinct usecases, namely, a) Patient Medical Data and b) Network Traffic Data, to both produce Federated modes capable of detecting possible intrusions in a MCPS network, in a distributed manner, but also to establish the use of Federated Learning in the IDS methodology. Evaluating the proposed networks showed an increase in the Federated model's ability to detect anomalies in both usecases than the non-Federated models.
Our future plans in this field include the implementation of a range of DL models for Federated Training and augment them with privacy preserving policies to further secure the implementation.

VII. ACKNOWLEDGMENT
This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 957406 (TERMINET).