Integrity veriﬁcation of Docker containers for a lightweight cloud environment

Virtualisation techniques are growing in popularity and importance, given their application to server consolidation and to cloud computing. Remote Attestation is a well-known technique to assess the software integrity of a node. It works well with physical platforms, but not so well with virtual machines hosted in a full virtualisation environment (such as the Xen hypervisor or Kernel-based Virtual Machine) and it is simply not available for a lightweight virtualisation environment (such as Docker). On the contrary, the latter is increasingly used, especially in lightweight cloud platforms, because of its ﬂexibility and limited overhead as compared to virtual machines. This paper presents a solution for security monitoring of a lightweight cloud infrastructure, which exploits Remote Attestation to verify the software integrity of cloud applications during their whole life-cycle. Our solution leverages mainstream tools and architectures, like the Linux Integrity Measurement Architecture, the OpenAttestation platform and the Docker container engine, making it practical and readily available in a real-world scenario. Compared to a standard Docker deployment, our solution enables run-time veriﬁcation of container applications at the cost of a limited overhead.


Introduction
Current ICT infrastructures benefit from modern technologies such as those of the cloud computing paradigm, wherein virtualisation is used to optimize the usage of hardware platforms and to simplify service management, leading to improved flexibility, availability, and reduced costs. We refer to these architectures 5 as softwarised environments as they use commodity hardware customised via software. While virtualisation offers various advantages from the security point of view (such as isolation and sandboxing), at the same time it creates issues because services execute through a virtualisation manager, which in principle can alter their data and operations. It is therefore important to guarantee code 10 and data integrity for services running in softwarised environments. exploits this evidence to appraise the software integrity state of a node designed to the TCG specifications. RA can be used to evaluate the integrity of services running directly on a physical platform, because a direct link to the TPM is 20 needed. In virtualised environments, the hypervisor can be a target of the attestation process because it is a service running in the host system and therefore it has direct access to all hardware components. However, the TPM-based integrity verification of physical compute nodes is not easily extensible to virtual instances. In fact, the virtualisation layer offered by the hypervisor may not ex- 25 port TPM resources to the Virtual Machine (VM). Even if the TPM resources are exported to the VM (as in the Kernel-based Virtual Machine (KVM) system that implements a pass-through driver for direct interaction between the VM and the hardware TPM), the TCG specification assumes a one-to-one rela-tionship between the operating system and the TPM [2], which makes the chip 30 unable to provide authentic evidence for all the VMs. This problem is the main obstacle to use RA effectively in a hypervisor-based virtualisation environment.
A lot of interest is currently being paid to lightweight virtualisation techniques, as they incur a lower performance penalty compared to full virtualisation because they create smaller and more agile execution environments, i.e. con- 35 tainers. According to a survey conducted by Datadog [3], 18.8% of its customers have adopted containers in their infrastructures by April 2017, which represents a 40% year-over-year growth in the number. Deutsche Telekom started experimenting with containers in cloud-based Network Function Virtualisation (NFV) environments since 2015 [4] and, more recently, AT&T has introduced contain-40 ers in its enterprise cloud solution [5]. Lightweight virtualisation is especially important in target environments where the nodes have limited computational resources. Nonetheless, containers introduce security risks for the cloud platform as they provide less isolation than VMs, which may expose the host system to privilege escalation. Moreover, containers are based on images (i.e. archives 45 which include binaries, libraries, and the root file-system of each virtual instance) whose management and distribution may introduce vulnerabilities for the target platform, as noted by the NIST [6].
Well-known container technologies propose solutions to ensure the integrity of images, but they do not cover the whole service lifetime, as the image and 50 its internals may be changed at run-time by the host or by external attackers exploiting some vulnerability. For instance, an attacker that has gained a privileged access to a running container may compromise it by modifying service configurations and binaries, launching malicious scripts, or starting new processes, all without being detected by static verification techniques (since they 55 are concerned only with load-time integrity).
To tackle this problem, we have developed a solution for software integrity attestation of a lightweight cloud environment at run-time, which covers both the host and the services in the containers. This solution is named Docker Integrity Verification Engine (DIVE), as it targets the Docker container engine [7]. This is the de-facto standard for containerisation of applications on a Linux platform, as it is widely supported by vendors and production-oriented cloud environments. Nonetheless, our proposal is based on Linux kernel functionalities independent of the specific container runtime, hence it can be generalised to cover a wide spectrum of lightweight virtualisation technologies.
Our focus is only on the integrity of the services running in the host and in the containers themselves, but not on the privacy of the data or on the correctness of the computation, that are well-known problems of any virtualisation and cloud environment. However, note that if the software running in the host and the containers has been evaluated and erroneous or malicious behaviour is not 70 present, then DIVE demonstrates that no other software component is executed in the environment, and this in turn supports trust in the correct behaviour of the node.
From the authors' perspective, this work represents a concrete proposal to provide run-time integrity evidence of services running inside virtualised in-75 stances with a direct link to the hardware TPM, in a reliable and scalable manner. On one hand, this solution enables the deployment of critical services in containers whose integrity state can be securely attested. On another hand, the ability to attest services in containers fulfils the requirement of integrity verification in lightweight cloud infrastructures. However, we do not claim to cover 80 the full range of software attacks to virtualised instances, as the TPM-based attestation would not be able to detect in-memory manipulations of code or data.
This limitation can be addressed by modern Operating System (OS) protection techniques such as Address Space Layout Randomisation (ASLR) [8]. In turn, DIVE can be used to attest if the host kernel has activated system hardening 85 protections (such as ASLR) or not.
The rest of this paper is organised as follows: Section 2 presents some background about containers and Trusted Computing. Section 3 details our proposed integrity verification process and Section 4 presents the prototype of the proposed architecture. Section 5 discusses performance and scalability of the 90 proposed architecture. Section 6 presents the state of the art and discusses our improvement over it. Section 7 summarises the contributions of this article.

Background
This section introduces the technological concepts at the base of our proposal. The lightweight virtualisation concept is briefly explained, along with 95 a comparison among popular containerisation technologies. Then, the section covers the different aspects of the specific case study of our work, i.e. the Docker virtualisation engine. Moreover, an overview on the vulnerabilities of containers is exposed, as a means to justify the need for integrity verification in lightweight virtualisation. Afterwards, the section covers the Trusted Computing principles 100 on integrity verification by means of a hardware-based root of trust. Finally, the Remote Attestation workflow is explained in detail, along with the run-time integrity measurement architecture.

Containers
Containers represent an OS-level virtualisation technique, also called paenevir-105 tualisation [9] or lightweight virtualisation, where the OS kernel allows for multiple isolated user-space instances. Containers remove the overhead introduced by the hypervisor, which makes them smaller and faster to be started/stopped than hypervisor-based VMs [10].

Containerisation technologies 110
Lightweight virtualisation exploits the resource isolation features of the Linux kernel (such as cgroups [11], namespaces [12] and kernel capabilities [13]) and a union-capable file-system to allow independent containers to run within a single Linux instance. Docker [7] and Rkt [14] are well-known implementations of process containers, which build an isolated execution environment compris-115 ing a target application (e.g. a MySQL server) and its software dependencies (e.g. binaries, libraries). Linux Containers (LXC) [15] and LXD [16] represent alternative technologies that exploit the same kernel isolation features to offer a slightly different container run-time. These technologies are known as machine containers as they target isolation of multiple processes and services 120 within a single container. They are easier to manage because they can be customised as traditional VMs, but they ultimately offer less flexibility, reuse, and composability (very important in highly dynamic virtualised environments such as the cloud ones). More recently, Unikernels [17] have been proposed as an alternative to Linux container technologies. These represent a radical shift 125 from general-purpose virtual instances (such as VMs and machine containers) to fixed-purpose minimal images that run a single application, similarly to process containers. Unikernels are built into machine executables which embed the target application and its dependencies, and they can be run on an hypervisor or bare metal. Because of their minimalistic nature, they typically require limited 130 system resources to run. In this work we focus on Docker containers, rather than unikernels, as they are exploited in production-oriented environments and they can benefit from orchestration systems that are particularly relevant to cloud platforms.

135
Docker is an open-source project that uses containers to simplify the deployment of services by providing an additional layer of abstraction, the Docker Container Engine. It focuses on the one process per container methodology, which requires each virtualised instance to run a single application in foreground, meaning that the container lifetime would be equal to the target application run-140 time. A fundamental block in Docker is the image: containers themselves are launched from images, which can be considered as the "source code" for the containers. Images are built in a layered way, and layers are ordered and stored in a single file-system (Figure 1). At the base, there is a boot file-system, i.e.
bootfs, which contains required data (e.g. the bootloader) to boot the container.

145
The second level is composed by one or more root file-systems, called rootfs or base image. Each base image hosts a read-only layer of the OS image, comprising files and directories required for the overall functionality of the container.
Every time a new container is launched, the Docker daemon constructs a read- image, build-time commands, and exposed network ports).   The protected capabilities allow free read access to the PCRs by the platform's users, but direct writing is prevented. Hence PCRs act as accumulators: when the value of a register is updated, the new value depends both on the new measure and on the old value, to guarantee that once initialized it is not possible to forge the value of a PCR. This is the extend operation that works 230 as follows: where PCRold is the value present in the register before the extend operation,

The Remote Attestation process
A RA operation is initiated by a remote party, the Verifier, to request evidence about the integrity state to an attesting party, the Attester. The most appropriate format for an integrity evidence is the TCG Integrity Report (IR) [23], which comprises the values stored in the PCRs and their digital signature 260 computed with a key that never leaves the TPM (as a guarantee that the key can be used only by the TPM itself, as it is held in the device memory). However, note that the PCRs value depend not only on the software measured on the platform, but also on the order of the extend operation performed on these measurements. Hence, it is a very difficult task to know which the "good" and

Run-time Integrity Measurement Architecture
We use the Linux IMA [24]    proposed by Intel [26], which offers trust in the computing nodes but not in the hosted VMs. In this work we focus on enabling IMA support for containers, which leverage the sharing of the kernel with the host differently than VMs.
Because of this, we are able to keep the chain of trust intact regardless of the virtualisation layer.

Prototype of the architecture
In this section we describe the mapping of architectural components to software modules required for the DIVE functional prototype and the necessary modifications to the software to implement the proposed work-flow.

Prototype software and tools
We use the OpenAttestation (OAT) SDK v1.7 [27] to implement the Re- Finally, the Infrastructure Manager can be mapped on a container management engine or orchestrator platform such as OpenStack, which encompasses 435 several projects that tackle support for Docker containers in an orchestrated cloud environment [28], or the Kubernetes container management engine [29].
Our initial prototype focuses on the implementation of the Attester and Verifier elements of the DIVE architecture, hence it does not take into account integration with any Infrastructure Manager, which is left for future work.  UUIDs and device IDs in the XML Integrity Report, as shown in Figure 6. The new elements are named <Container>, providing the mapping between a Docker container UUID and the associated virtual device ID created by Device Mapper in the host system, and <Host>, containing the list of all physical devices ID associated to the host system.

485
Finally, we extended the Docker Command Line Interface (CLI) to retrieve the mapping between each container UUID (provided to the Verifier by the Infrastructure Manager) and its device ID in a timely manner.

495
In a previous work [30], an implementation of the binary attestation feature with the help of IMA has been developed. This implementation creates a reference database storing the names and digests of all the known "good" binaries.
For example, we could initially populate the database with all the elements of available packages in the official repositories for Linux distributions. One draw-500 back of this solution is that it considers the platform as a whole: if the integrity report contains just one unknown digest for a loaded binary or applied configuration, the Verifier will consider the whole platform as untrusted and a trusted state can be restored only by resetting the whole platform. Unfortunately, resetting a physical platform hosting tens or hundreds of VMs or containers is not

Performance evaluation
We have performed a preliminary evaluation of the DIVE technology, mainly focused on its performance impact on a single host deployment of Docker containers. Our goal is to demonstrate the negligible impact of this solution compared to an insecure Docker deployment. considered a deployment scenario comprising 512 containers. We repeated each test ten times, to get statistically meaningful results, and executed the test set (comprising ten runs) first without IMA and RA, then with only IMA active, and finally with both IMA and RA active. Three essential operations are involved in the test: run (to start a container), stop (to send SIGTERM to the process 560 running inside a container) and remove (to remove a container along with its assigned resources, e.g. disk storage and network bridge). The average time for these operations is depicted in Figures 7, 8, and 9, where the X-axis shows the number of active containers and the Y-axis is the time to finish the operation. The remove operation removes the resources assigned to a container and the 590 time needed grows linearly with the number of active containers ( Figure 9) and doesn't show any significant difference among the three different set-ups.    Terra [37] is a trusted virtual machine monitor that isolates and protects independent VMs. However, its integrity guarantee is limited to the hypervisor and to computing and certifying the hashes of the images loaded in the VMs, but there is no attestation of the internal behaviour of the VMs. As such, it's more similar to the load-time image integrity attestation offered by Trusted Compute Pools for OpenStack [26], and by Docker Content Trust. DIVE can use the latter 730 for image integrity verification at load time but additionally it offers continuous monitoring and reporting of the operations performed by each container. In this respect, it is better than Terra, although the latter offers other interesting security properties not relevant for the current discussion. Compared to [38], which generates a model for Docker container and verifies the deployability of 735 the container architecture at design time, our work is more focused on verifying the integrity of the services running inside the containers in a reliable manner.
Of course, these two approaches could be coupled to provide a holistic Docker container execution environment.
The Intel Open CIT framework [39] implements image integrity verifica- be manually updated if any measured file is changed.

Conclusion and future directions
We have presented DIVE, an architecture to support integrity verification of Docker containers using well-known trusted computing techniques. With this solution, the services in the containers can be attested as if they would be run-760 ning in a physical platform, and their integrity can be well understood by a third-party in a reliable manner. The capability to directly interact with the TPM makes the integrity reports from the Attesters non-forgeable, which provides strong protection towards remote attacks. DIVE is also a practical tool as it has a nearly negligible performance impact on the hosted services. This is due 765 both to its design and the usage of mainstream tools, such as IMA (a standard feature of the Linux kernel) and OpenAttestation (a well-known tool for attestation of cloud services). The most data-intensive part of the work is offloaded to the Verifier, which is a third party not directly involved in the actual service provision. Another important feature of DIVE is that the Verifier can identify 770 which container or hosting system is compromised. Thus, the Infrastructure Manager can take an informed decision about the roll-back strategy: disable just the compromised container and replace it with a new instance or stop all containers and restart the whole physical machine. DIVE is transparent to the services running in the containers (that don't need to be modified in any way) 775 and interacts with a normal Docker environment. All modifications to enable integrity verification are minor and performed directly in the host system. This makes DIVE very easy to be adopted and with no impact on the hosted services. DIVE has some limitations that we plan to address in future work. First, its dependency on OpenAttestation makes it non-portable on hosts equipped 780 with TPM 2.0. In this regard, integration of DIVE in the Open CIT load-time integrity verification process will be investigated. Moreover, our solution introduces a lock-in on the Device Mapper storage driver for Docker, although it is generally supported on Linux distributions. Finally, DIVE cannot ensure protection against in-memory attacks, which would require the introduction of 785 separate user-space and kernel-space protections (e.g. address space randomisa-tion). Planned future work for DIVE is twofold: performance and proactivity.
Application to softwarised network environments typically requires a small footprint and fast reaction time, which can be obtained by rewriting the attestation agent and by looking to hardware trust anchors with better performance than 790 the standard TPM chip. Being informed that a service has been compromised is important for its management but avoiding the compromise would be even better. Along this line, we want to make DIVE a proactive service by coupling it with policy enforcement solutions, such as SELinux.