Project deliverable Open Access
This document describes the first release and evaluation of the runtime environment developed by the AISPRINT project, while the second release and evaluation of the runtime environment (D3.3) is due at M24. It describes the components involved in this first release that support the continuous deployment and programming framework runtime, as well as the results of the preliminary tests on the technologies employed that support the design decisions.
The technological choices are taken considering the requirements elicited from the analysis of the three use cases specified in the project proposal and detailed in AI-SPRINT Deliverable D1.2 - Requirements Analysis. In addition, an application based on face mask detection has been used as a “lead by example” approach to showcase an inference workflow consisting of images anonymised in the edge using resource-constrained computing resources, i.e. a cluster of Raspberry Pis, while face mask detection is performed on a dynamically provisioned elastic Kubernetes on top of an OpenStack-based cloud deployment, showing the initial
integration among several components of the AI-SPRINT portfolio.
Overall, the first release of the runtime environment includes components in the following categories. For each one, the main tools are identified and a brief summary of the developments performed for the first release is included. First, deployment tools that provide custom virtualised computing resources from Cloud back-ends and resources located in the edge to support cloud-edge orchestration, enabling the automatic deployment of AI application models and components, without manual provisioning. The main tool involved is the Infrastructure Manager (IM), which has been extended in this first release to provision minified Kubernetes distributions such as K3S, to be used for edge-based resources, to include SGX support in its OpenStack connector, including its elasticity connector.
Second, monitoring tools, to gather infrastructure and application-level metrics based on NoSQL time series databases, responsible for storing and analysing the collected metrics, including visual dashboards. This includes the definition of synchronisation mechanisms among instances of the monitoring infrastructure to collect data at the edge to be stored in Cloud-based resources. The main tools involved are InfluxDB for metrics collection and analysis together with Telegraf for gathering metrics data and local buffering. Automated deployment procedures have been developed in the first release.
Third, scheduling for accelerated devices, including both local and remote GPUs, to jointly solve resource planning, in order to decide the appropriate number of GPUs to assign to jobs. This includes the shared usage of remote GPU-based computing. One of the tools involved is rCUDA, which has been extended in the first release to support Docker containers, newer versions of both TensorFlow and CUDA. Fourth, the programming framework runtime, to perform the execution of workflows along the computing continuum and to exploit the parallelism of the underlying computing resources. This ranges from detecting the data dependencies among the components and the allocation of parallel tasks to the available computing resources along the continuum, using also the FaaS (Functions as a Service) computing model. OSCAR, which provides event-driven file-processing serverless workflow execution along the computing continuum, is employed. For the first release, OSCAR was extended to be deployed on Raspberry PI clusters, used for AI inference at the edge, to support synchronous invocations and to include the initial support for GPUs. Also, COMPSs is used to orchestrate the execution of tasks on top of any distributed platform to exploit its parallelism, targeting the computing continuum. In the first release COMPSs has been extended to be compliant with the FaaS paradigm and to improve the resource management for dynamic addition and removal of resources and to facilitate the agent deployment to set hierarchies.
Fifth, application reconfiguration, to dynamically reconfigure the computational resources and execution workflow to consider changes in the performance of the underlying infrastructure, including the migration D3.1 First release and evaluation of the
runtime environment of tasks. This involves generating optimal components placement to be adapted at runtime depending on
the underlying state of resources. One of the tools employed is SPACE4AI-R, to provide optimal component placement, planned to be delivered by M24. Another tool is Krake, an orchestrator engine for containerised workloads used for rCUDA client migration, when the network to access remote GPUs becomes a bottleneck (support for stateful applications and QoS, performance and energy related scheduling is under development).
Finally, federated learning and privacy preserving continuous training tasks have also started, while software components will be made available in the second year of the project.
D3.1 - First release and evaluation of the runtime environment.pdf