Scheduler architectures and mechanisms characterization
Creators
- 1. VU Amsterdam
- 2. University of Wuerzburg
- 3. ESPOL
Description
This repository contains the data, the simulator, validation software, and the support scripts necessary to reproduce the plots from our paper.
We use data from the IBM COS[1] dataset. We extract a subset of it and convert it into a format suitable for simulation. These simulable traces are in the input_traces4 folder.
We implement our architectures and mechanisms by extending the OpenDC[2] datacenter simulator. The specific version of the source code we use in this work can be found at https://github.com/sacheendra/opendc/tree/cachearch. We include a pre-compiled jar file of the simulator in this repository. We also include jars of the validation software in the validation folder.
The three support scripts enumerated 1, 2, and 3 use the Ray distributed programming framework to run multiple simulations simulatenously, to produce the plot for one experiment in the paper. Each simulation requires a dedicated CPU core and 10GB of RAM. On a single server with 20 cores, all the simulations complete in about half an hour.
We also provide scripts to run all simulations required to produce all the plots in the paper in the full_experiments folder. Those scripts can also be executed in the order 1, 2, and 3 to produce all the plots. But, the full experiments on a single node can take multiple days. We currently use a setup specific to our cluster to run it on 20 nodes. But we plan to make scripts for any SLURM-based cluster soon.
The executables and scripts for the validation experiment are in the validation folder. Currently, these require manual setup. But we also plan to automate the process soon.
1 REQUIREMENTS
Our experiments require installing the following software on all cluster nodes (or on a single node for the small subset):
(1) Linux-based OS
(2) JDK 17 (for the simulator and real-world experiments)
(3) Python 3.11 (for the support scripts)
(4) A LaTeX distribution (for the plots)
2 BUILD THE SIMULATOR
This repository comes with a pre-built version of the simulator.
But, this is the procedure if a user chooses to build their own.
(1) Clone the latest source code from https://github.com/sacheendra/opendc/tree/cachearch.
(2) Build the simulator using the command ./gradlew :opendc-storage:opendc-distributed-cache:fatJar.
(3) The jar file will be in the opendc-storage/opendc-distributed-cache/build/libs folder.
(4) Copy the jar file opendc-distributed-cache-3.0-SNAPSHOT.jar to this repository.
3 RUN THE SIMULATIONS AND GENERATE THE PLOTS
(1) Create a python virtual environment using a tool of your choice. Conda and virtualenv both work.
(2) Use python 3.11.
(3) Install the required libraries using pip -r requirements.txt.
(4) Run the simulations using python 1. run_simulations.py.
(5) Next, summarize the results using python 2. summarize_results.py.
(6) Finally, generate plots from the summaries using python 3. plot_summary.py.
(7) The plots will be in the result_plots4 folder.
The above process will generate the plot for one experiment in the paper. Specifically, they will generate the leftmost quarter of Figure 6 from the paper. To generate the plots for all experiments, repeat the process in the full_experiments folder. Running the simulations for one experiment takes around 30 minutes on a 20 core server. Running all the simulations takes around 45 minutes on 20 such servers. The three validation runs in the paper take 3 hours each.
4 EXPERIMENT RESULT DESCRIPTION
The experiment that Figure 6 depicts the results of evaluates the performance of different scheduler architectures across different system utilizations. The performance is evaluated across 55 traces. We use the slowdown metric. The slowdown is the ratio of the actual execution time to the ideal execution time of a task. We evaluate both the median slowdown and tail slowdown performance for all traces. For the topmost subfigure, the trend line represents the median of median slowdowns. For the bottommost subfigure, the trend line represents the median of the tail slowdown. The whiskers represent the 25th and 75th percentile median and tail for the top and bottom figures, respectively.
Files
sc_reprod.zip
Files
(914.8 MB)
Name | Size | Download all |
---|---|---|
md5:605d9224012ada8b486aaabcac7cb90c
|
914.8 MB | Preview Download |