Replication Package for: Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures
Description
This repository contains a replication package and experimental results for our study Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures.
The following description can also be found in the README.md file.
Repeating Benchmark Execution
The following introduction describes how to repeat our scalability experiments. If you plan to conduct your own studies, we suggest to use the latest version of Theodolite with significantly enhanced usability.
The Apache Kafka Streams scalability experiments of our study were executed with Theodolite v0.1.2. To repeat our Kafka Streams experiments:
- Clone and install Theodolite v0.1.2 according to the official documentation located in
execution
. - Copy the file
repeat-kstream.sh
into Theodolite'sexecution
directory. - Run the repetition file with
./repeat-kstream.sh
from within theexecution
directory.
Our Apache Flink benchmark implementations are currently migrated to the latest version of Theodolite. Theodolite's apache-flink
Branch provides the basis for our Flink scalability experiments. To repeat them:
- Clone Theodolite's
apache-flink
Branch and install Theodolite according to the official documentation located inexecution
(should be identical to the installation for Kafka Streams (see above)). - Copy the files
repeat-flink-without-checkpointing.sh
andrepeat-flink-with-checkpointing.sh
into Theodolite'sexecution
directory. - Switch to the
execution
directory. - Run the first repetition file with
./repeat-flink-with-checkpointing.sh
. - Disable checkpointing by reconfiguring the Kubernetes resources
jobmanager-job.yaml
andtaskmanager-job-deployment.yaml
for each benchmark (uc{1,2,3,4}-application
) by setting the environment variableCHECKPOINTING
to"false"
. - Run the second repetition file with
./repeat-flink-without-checkpointing.sh
.
Please note that the naming of our benchmarks recently changed. While our publication already uses the new naming, the corresponding Theodolite versions are is still using the old one. Specifically, this means that UC1 in the publication is UC1 in Theodolite, UC2 in the publication is UC3 in Theodolite, UC3 in the publication is UC4 in Theodolite, and UC4 in the publication is UC2 in Theodolite.
Raw Measurements
The results of above benchmark execution can be found in the measurements
directory. These are CSV files, containing the measured lag trend over time for a certain subexperiment. Theodolite creates a bunch of additional files, which serve for debugging and preliminary interpretation. As these files are not required for replication, we do not included them in this package.
The CSV files are named according to the schema exp{id}_{uc}_{load}_{inst}_totallag.csv
, where {id}
represents the experiment ID, assigned by Theodolite, {uc}
the benchmark name, {load}
the generated load, and {inst}
the number of evaluated instances.
The CSV table experiments.csv
provides an overview about the configurations used in each experiment.
Reproducing Scalability Analysis
The following introduction describes how to repeat our scalability analysis, either with our measurements or with your own. If you plan to conduct your own studies, we suggest to use the latest version of Theodolite with significantly enhanced usability.
Analyzing the Theodolite's measurements is done using two Jupyter notebooks. In general, these notebooks should be runnable by any Jupyter server. Python 3.7 or 3.8 is required (e.g., in a virtual environment) as well as some Python libraries, which can be installed via: pip install -r requirements.txt
. See the Theodolite documentation for additional installation guidance.
Obtaining a Scalability Graph as a CSV File
The scalability-graph.ipynb
notebook combines the measurements (i.e., the totallag.csv
files) of one experiment. It produces a CSV file, which provides a mapping of load intensities to minimum required resources for that load (i.e., the scalability graph). The CSV files are named according to the schema exp{id}_min-suitable-instances.csv
, where {id}
represents the experiment ID. Additional guidance is provided in the notebook.
Resulting Scalability Graph CSV Files
The results
directory provides the scalability graphs for all our executed experiments.
Visualization of the Scalability Graph
The scalability-graph-plotter.ipynb
notebook creates PDF plots of a scalability graph and allows to combine multiple scalability graphs in one plot. It can be adjusted to match the desired visualization.
Acknowledgments
This research is funded by the German Federal Ministry of Education and Research (BMBF) under grant no. 01IS17084 and is part of the Titan project.
Files
theodolite-replication-package.zip
Files
(1.8 MB)
Name | Size | Download all |
---|---|---|
md5:ebe7d41031949aca872f0a138ea99e60
|
1.8 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Journal article: 10.1016/j.bdr.2021.100209 (DOI)