Thesis Artifacts for: Benchmarking the Scalability of Distributed Stream Processing Engines in Case of Load Peaks
Description
Thesis Artifacts for: Benchmarking the Scalability of Distributed Stream Processing Engines in Case of Load Peaks
A detailed description can be found in the README.md. The corressponding thesis can be found here.
Abstract:
Traditional databases and batch processing systems are not able to handle the loads
experienced by many modern applications in real-time. Consequently, in order to process
the increasing amounts of data in real-time, Distributed Stream Processing Engines (DSPEs)
are used to process real-time loads. Furthermore, increasing data rates make scalability
an important quality attribute of a DSPEs. On top, productive environments are encounter
fluctuating loads and load peaks.
In order to tackle these challenges, we develop a model of load and empirically evaluate
the scalability of the DSPEs Kafka Streams and Apache Flink in case of load peaks, in this
thesis. For this purpose, we benchmark the impact of different loads. Consequently, we use
the Theodolite benchmarking method and extend the Theodolite framework. Moreover, in
order to express and compare different loads, we develop a model capable of expressing
variable loads over time. Furthermore, we model two different classes of load profiles with
load peaks in frequency dimension and describe how to scale these load profiles. Geared
towards using these load profiles, we implement a method to generate loads with variable
frequency. For the evaluation, we develop a new Service Level Objective (SLO) which can
be used to evaluate whether a System Under Tests (SUTs) can process a load profile. Finally,
we benchmark Kafka Streams and Flink with three different load profiles and compare the
results to similar benchmarks.
The results of our benchmarks indicate that load peaks have a significant impact on
the scalability of stream processing engines. Subsequently, the handling of load peaks
should be considered when deploying applications in cloud environments. Furthermore,
our results show that both Kafka Streams and Flink can be considered as scalable in case
of load peaks for the evaluated benchmarks. Depending on the task sample, either Kafka
Streams or Flink has a higher resource demand. In direct comparison to other benchmarks
conducted with Theodolite with similar configurations, we observed deviations in the
results. Additionally, the results of our benchmarks indicate that scaling the amount of
keys of a load has a greater impact on the scalability as scaling the height of the peak of a
load.
Files
thesis-data.zip
Files
(66.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:cc208ad87219a85f31c138fdff9b5a22
|
66.8 MB | Preview Download |