A GDPR-Compliant Approach to Real-Time Processing of Sensitive Data

. Cyber-attacks represent a serious threat to public authorities and its agencies are regularly targeted by hackers. The public sector as a whole collects lots of data on its citizens but that data is often kept on vulnerable systems. Especially for Local Public Administrations (LPAs), protection against cyber-at-tacks is an issue due to outdated technologies and budget constraints. Furthermore, the General Data Protection Regulation (GDPR) poses many constraints/limitations on the data usage when “special type of data” is processed. In this paper the approach of the EU project COMPACT (H2020) is presented highlighting the solutions used to guarantee the data privacy during the real time monitoring performed by the COMPACT’s security tools.


Introduction
The advent of the Internet has been opening new opportunities for Public Administrations (PAs) to improve their efficiency while providing better services to citizens via an ever larger set of specialized network applications, including e-government, ehealth, and more. This is at the heart of a European wide eGovernment action plan, whose latest update covers the years 2016 to 2020 and which also mentions the importance of trustworthiness and security as a key guiding value. Indeed, as a potential channel of accessing personal information, these specialized applications also expose the public sector to new risks. The cybersecurity landscape is changing, and Local Public Administrations (LPAs) and Critical Infrastructures (CIs) are rapidly becoming an attractive target for cybercriminals [1,2,3,4,5], who might access some sets of personal data or gain control over smartly operated city resources through LPAs/CIs infrastructures. The consequences of cyber-threats have the potential to be considerable causing business interruptions, data losses, and thefts of intellectual property, significantly impacting both individuals and organizations.
It is claimed that cyber threats are the most significant and rising risk that public sector organizations are facing. Reports demonstrates that nearly 40% of malware attacks and in general cyber threats to which public bodies have been subject [6] are against public sector organizations [1], i.e. more than sectors (e.g. finance) which have traditionally been thought of as top targets. The interconnection of operational environment systems, used by the public bodies in ever growing scale, exacerbates the problem, especially as malware distribution periods (both fixed and mobile) are becoming increasingly short [7]. The increase in severity of cyber-attacks coincides with a boom in the different types of connected devices, as well as with a huge expansion in virtualization and public clouds.
In particular, Information Commissioner's Office (ICO) reports that: "In a change to the previous quarter, the second most prevalent sector in Q4 (January to March 2016)1 was local government. The number of data security incidents in this sector increased by 34% compared to the previous quarter (from 32% in Q3 to 43% in Q4).
Coupled with the overall decrease in data security incidents during Q4, this means that the percentage of total incidents suffered by the local government sector has also increased, from 6% in Q3 to 10% in Q4." [8].
Therefore, LPAs need to understand the cyber risks to which they are exposed and take proper actions to protect their infrastructures from cyber disruptions, to safeguard citizen's and enterprises' information they manage. The DBIR 2016 report [9] provides the number of security incidents by victim industry and organization size (2015 dataset). The category "Public Industry"which refers to PA organizationsis by far the most targeted, with 47000+ attacks out of a total of about 64000. The report also shows the distribution of incidents by patterns: the vast majority of incidents in the public sector can be rooted to: 1) miscellaneous errors (24%), 2) privilege misuse (22%), 3) stolen assets (20%), and 4) crimeware (16%).
The issues that have been identified and that hamper the ability of PA organizations of improving their cyber security level, most notably are: 1. Lack of standardized data classification -45% of public sector respondents do not use standardized data classification techniques/procedures. As a consequence, LPAs run a higher risk of accidentally exposing private data in their rush to comply with emerging regulationsboth at the national and at the EU levelpromoting transparency of the Public Sector. Also, only 12% stated that they used standardized policies and that they proactively verify and enforce those policies. 2. Lack of effective Non-Disclosure Agreements (NDAs) -40% of public sector organizations still rely on paper-based NDAs, and use them inconsistently. This amplifies risks related to the human factor, which is already one of the biggest, since malicious or disgruntled personnel with access to important information assets can be a significant threat to the security of those assets. 3. Lack of plans for responding to security breaches and for disaster recovery -36% of public sector organizations do not have a plan for responding to security breaches, and only 10% of public sector organizations test for the worst-case scenario. 34% of public sector organizations do not have budgeted disaster recovery plans. These are major impairments to contain the damage, since when a security incident or a disaster occur, proper and timely action is key. 4. Lack of uniformly enforced security policies -33% of public sector organizations do not have uniformly enforced security policies (this means limited application -if not complete lack -of a consistent security policy throughout the whole organization.). This condition hinders their ability to comply with regulations, such as the European Union Data Protection Directive (EUDPD). 5. Lack of adequate policies and practices for data disposal -76% of public sector organizations do not have adequate policies and practices for secure and reliable data disposal. In particular, only 16% of public sector organizations have written policies that require destruction records to be actually collected, practiced, and audited. The enforcement of strong policies to govern the proper disposal of electronic and paper records -based on sound technical and organizational guidelines and best practices -is the prerequisite for protecting private data from unauthorized disclosure. 6. Lack of effective access control mechanisms -20% of public sector organizations do not use roles to manage access, and more than 26% of public sector organizations have no official procedure for terminated or reassigned employees. This create vulnerabilities, since it allows inappropriate access to resources. 7. Large set of legacy unmaintained and undocumented systems representing an attack surface of unknown dimension. 8. Inappropriate management of security updates (patches), as well as usage of out of date software in computers, mobile devices and central servers. 9. Limited capacity, and motivation, of LPAs personnel in detecting and reporting cyber-attacks. This is due to a number of interconnected factors including (i) the aging of the LPAs workforce, (ii) its limited technological skills and (iii) the lack of acknowledgment of employees' achievements. This makes the PA workforces less responsive to the traditional educational measures (like classroom training).
It is clear that innovative cyber security tools are needed in order to guarantee the protection of LPAs. In addition, these tools must to deal with: (1) Limited resources in terms of both economic and structural (2) Strong privacy requirements coming from the recent adoption of the General Data Protection Regulation (GDPR) on the protection of natural persons with regard to the processing of personal data and on the free movement of such data.

Backgroud -Homomorphic encryption 2.1 Homomorphic encryption
Homomorphic Encryption is a recent cryptographic method which allows to perform computation on encrypted data without decrypting it. This way, the confidential data can be protected not only during the storage and exchange/transfer but also during the processing. Avoiding intrusions from semi-honest or malicious cloud providers when outsourcing data processing to the Cloud is crucial for the case of sensitive data that are about to be processed in frames of the COMPACT solution. The first HE algorithms, i.e., Partially Homomorphic Encryption (PHE) [14] [15], had the ability to carry out just one type of operations (e.g., addition, or multiplication). Clearly, the limitation in the type of executable computations hampered the usage of HE in practical contexts. Gentry et al. [16] provided the first implementation of a Fully Homomorphic Encryption (FHE) scheme. Gentry's algorithm allows the execution of an arbitrary number of additions and multiplications over encrypted data. The security of the system is based on the noise introduced into the ciphered text. When the noise reaches some maximum amount, the ciphertext becomes undecryptable. This solution was very costly in terms of performance. It highly affects CPU and memory resources.
An attempt to simplify the method has been provided by Van Dijk et al. [17] who proposed a FHE i.e., Somewhat Homomorphic Encryption (SHE) over the integers. The price to pay with SHE is given by the limited number of mathematical operations that can be performed. However, in many real-world applications (e.g., medical, financial) this seems reasonable sinceas Naehrig et al. [9] analysis reportsmost of the evaluations required, i.e., one-time statistical functions, fits well with SHE constraints.
Among the aims of COMPACT are to adopt Fully Homomorphic Encryption (FHE) Schemes capable of performing any arbitrary function in an homomorphic way and to mitigate performance overheads introduced by Homomorphic computation, using recent dedicated compilation and parallelism techniques and mechanisms.

3
The COMPACT project COMPACT's overarching objective is to enable LPAs to become the main actors of their own cyber-resilience improvement process, by providing them with effective tools and services for removing security bottlenecks. This can be broken down into five finergrain objectives:  Objective #1 -Making the PA personnel aware of the basic cyber security threats they are exposed to.  Objective #2 -Improving the skillsboth technical and behavioralof the PA personnel via innovative training techniques that are well received by the (non IT-expert) workforce.  Objective #3 -Providing protection tools against basic cyber security threats, i.e. those with a higher impact on LPAs. These include [10,11,12]: phishing, ransomware, Bring Your Own Device (BYOD), jailbreaking the cloud, crosssite scripting, code (particularly SQL) injection, and more.  Objective #4 -Creating a LPAs level information hub, for favouring reliable and timely exchange of information among LPAs on cyber security guidelines and best practices, as well as on Indicators of Compromise (IoC).
 Objective #5 -Creating a link between COMPACT LPAs level information hub and major EU level initiatives, for supporting LPAs to improve cyberresilience in a complex European context.

Fig. 1. COMPACT objectives
To achieve its objectives, COMPACT will develop four types of tools/services (Fig.  1), which include: 1. Risk assessment tools -Tailored to the LPAs context that will allow LPAs to evaluate and monitor their exposure to the most relevant (i.e. with the highest impact) cyber treats. They will enable LPAs to prioritize the adoption of preventive and reactive countermeasures, for maximum efficiency of resource usage for cyber protection purposes. 2. Education services -Through dedicated game-based training, focused not only on specific cyber-threats but also on psychological and behavioral factors, to maximize the effectiveness of the learning experience, while also containing the training time. 3. Monitoring services (SOC) -That continuously process events related to the status of the infrastructure and correlate them with information from threat intelligence feeds to timely spot anomalies and also suggest recovery actions that can be implemented. 4. Knowledge Sharing services -These will include best practices and guidelines, focused on the specific needs of LPAs, that can be easily adopted to quickly increase the cyber security level of the organization. Just as importantly, they are also used (i) at the Member States level as an input for the activity of national cybersecurity stakeholders (like national CERTs5) and (ii) at the EU level as an input for European boards, agencies, and initiatives (like ENISA and the CSIRT [13] network foreseen in the NIS directive)

COMPACT monitoring service
The Security Operations Centre (SOC) provides, throughout advanced Security Information and Event Management (SIEM), the real-time monitoring capability of the organization. SOC platform is an integrated technology platform that allows for accurate, timely and trustworthy detection and diagnosis of security attacks, combining information from physical and logical event sources. The platform has been implemented in a distributed loosely interoperating architecture, where components depend on each other to the least extent practicable. The SOC is implemented as a distributed architecture that enables: i) collection of security-relevant data from a variety of data feeds; ii) correlation of events and context information, via combined use of stream and batch processing; and iii) production and secure storage of incident-related evidence.
The event sources for SOC platform can be physical or logical alike. Physical event sources include physical systems that are existing in the buildings, like video surveillance system, physical access control system, fire alarm system, other physical security systems, or automation and building management systems, for example. Logical security systems can be defined to consist of software safeguards for an organization's systems, including user identification and password access, authenticating, access rights and authority levels.
SOC platform has the capability to combine event information from multiple event sources and to make sophisticated diagnosis based on the received information. As the outcome of the analysis performed by the SOC platform, the end user will receive ranked alerts and forensic evidences.
An architecture of the current solution is reported in Fig. 2. SOC platform consists of the following main components:  Correlation Engine:

Fig. 2. SOC architecture
The Correlation Engine is the component in charge of the event diagnosis process. It operates by correlating a huge amount of security relevant events/information from the physical and the electronic domain in real-time, through Complex Event Processing (CEP) techniques and stream processing computing technologies. The attack diagnosis process is driven by correlation rules that aggregate the parameters of attack symptoms, such as the attack type, the target component and the temporal proximity. Alerts are generated only when the correlation among such symptoms indicates a potential attack, thus exhibiting low false positive rates and improved detection capability w.r.t. single probes.
 Rule Engine: The Rule Engine provides the logical rules to be followed for the Correlation Engine. The Rule Engine includes two main components, Signature Based Support and Anomaly Based Support.

 Forensic Module:
The Forensic Module provides a set of services that enables the end user (SOC operator) to trace from an event to the log data from which it was identified. The module will ensure that the events and their associated logs are stored in a forensically sound manner. It will support processes that ensure, to the greatest extent possible, that the event data will be acceptable as evidence.
In terms of data collection, the prototype is equipped with a number of adapters, for receiving events from a wide variety of Commercial Off The Shelf (COTS) products for logical and physical security monitoring. In terms of data processing, the prototype enables: 1) pre-processing of data at the edge of the system and 2) stream and batch processing in the core of the system. The business logic that drives the correlation process can be easily customized by means of a user-friendly graphical interface. The SIEM is the main component of the SOC systems and includes:  A runtime engine to allow the distributed streaming dataflow  Two data processing APIs, one for the Stream Processing and one for the Batch Processing  Three class of libraries: 1. Complex Event Processing (CEP) to detect event patterns in an endless stream of events. It is event processing that combines data from multiple sources to infer events or patterns in order to highlight specific situations.
The goal of complex event processing is to identify meaningful events (such as threats) and respond to them as quickly as possible. This real time elaboration can be based on a time window or event approach. 2. Machine Learning that gives SIEM the ability to learn without being explicitly programmed. It requires the use of algorithms that can learn from and make predictions on datasuch algorithms overcome following strictly static program instructions by making data-driven predictions or decisions, through building a model from sample inputs. 3. Homomorphic Data Processing to allow the processing of homomorphic encrypted data without decrypt them The communication between the SOC component is provided by a Publish Subscribe communication channel: it is in charge of delivery the data and messages between data sources, SIEM GUI and SIEM Core.

Fig. 3. SIEM components
Even a SOC prototype is already available; it will be evolved to meet the COMPACT requirements along several dimensions.
The first development will regards the improvement and adaptation of the SOC data collection to the data that must be acquired during the LPA monitoring. Many data collection features are already available in the current SOC prototype and these will be adapted to be compliant with the LPA environments, others will be developed to meet specific requirements likes the acquisition of information from the Windows Management Instrumentation tool and from others common and uncommon security tools (Nagios, Sophos, etc.).
The second improvement will be related to the implementation of the Data Management and Policy Enforcement component (DMPE). This component will be integrated in each data collection tool in order to enforce the privacy requirements imposed by the LPA (to be compliant with the GDPR). In particular, the DMPE will be in charge of apply the most appropriate techniques needed to meet the privacy requirements, such as anonymization and pseudo anonymization to remove special categories of data or Homomorphic encryption to hide and process the data in a special encrypted form. The third improvement is related to the technology update of the current correlation and processing features of the SOC, by exploiting a best of breed selection of Open Source technologies for CEP, machine learning, and data mining.
The fourth improvement will be related to the implementation of specific correlation operators (CEP operators) able to process the homomorphically encrypted data without to decrypt it. Finally, the SOC graphical user interface will be developed/adapted in order to meet the guideline defined by the COMPACT consortium and to be integrated with the COMPACT unified dashboard.

Conclusion
In this paper, a brief overview about the COMPACT approach used for the implementation of an LPA specific Security Monitoring Center has been proposed highlighting how this component will guarantee the privacy of the data during the processing phase.