Published December 5, 2022 | Version v1
Journal article Open

Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous Hardware

Description

The ever-increasing demand for high performance Big Data ana- lytics and data processing, has paved the way for heterogeneous hardware accelerators, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), to be integrated into modern Big Data platforms. Currently, this integration comes at the cost of programmability since the end-user Application Programming Interface (APIs) must be altered to access the underlying heterogeneous hardware. For example, current Big Data frameworks, such as Apache Spark, provide a new API that combines the existing Spark programming model with GPUs. For other Big Data frameworks, such as Flink, the integration of GPUs and FPGAs is achieved via external API calls that bypass their execution models completely.

In this paper, we rethink current Big Data frameworks from a systems and programming language perspective, and introduce a novel co-designed approach for integrating hardware accelera- tion into their execution models. The novelty of our approach is attributed to two key design decisions: a) support for arbitrary User Defined Functions (UDFs), and b) no modifications to the user level API. The proposed approach has been prototyped in the context of Apache Flink, and enables unmodified applications written in Java to run on heterogeneous hardware, such as GPU and FPGAs, trans- parently to the users. The performance evaluation of the proposed solution has shown performance speedups of up to 65x on GPUs and 184x on FPGAs for suitable workloads of standard benchmarks and industrial use cases against vanilla Flink running on traditional multi-core CPUs.

Files

xekalaki-VLDB.pdf

Files (2.1 MB)

Name Size Download all
md5:184d726aff0aab7e99000591653c9e1e
2.1 MB Preview Download

Additional details

Funding

European Commission
ELEGANT – Secure and Seamless Edge-to-Cloud Analytics 957286
European Commission
E2DATA – European Extreme Performing Big Data Stacks 780245
European Commission
ENCRYPT – A SCALABLE AND PRACTICAL PRIVACY-PRESERVING FRAMEWORK 101070670
European Commission
TANGO – Digital Technologies ActiNg as a Gatekeeper to information and data flOws 101070052