An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing Systems

Jamshidi, Pooyan; Casale, Giuliano

doi:10.5281/zenodo.56238

Published June 22, 2016 | Version v1

Dataset Open

An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing Systems

1. Imperial College London

The datasets in this release support the results presented in the paper

P. Jamshidi, G. Casale, "An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing Systems", accepted for presentation at MASCOTS 2016.

An open access to the paper is available at https://arxiv.org/abs/1606.06543

Also open source code is available at https://github.com/dice-project/DICE-Configuration-BO4CO

The archive contains 10 comma separated datasets representing performance measurements (throughput and latency) for 3 different stream benchmark applications. These have been experimentally collected on 5 different cloud cluster over the course of 3 months (24/7). Each row in the datasets represents a different configuration setting for the application and the last two columns represent the average performance of the application measured over the course of 10 minutes under that specific configuration setting. The datasets contains a full factorial and exhaustive measurements for all possible settings limited to a predetermined interval for each variable. Each dataset is named in the following format: "benchmark_application-dimensions-cluster_name". For example, "wc-6d-c1" refers to WordCount benchmark application with 6 dimensions (i.e., we varied 6 configuration parameters) and the application was deployed on c1 cluster (OpenNebula, see Appendix). This resulted in a dataset of size 2880, i.e., it has taken 2880*10m=480h=20days for collecting the data!

For more information about the data refer to the appendix of the paper: https://arxiv.org/abs/1606.06543.

When referring to the dataset or code please cite the paper above.

Files

bo4co_dataset.zip

Files (139.4 kB)

Name	Size	Download all
bo4co_dataset.zip md5:c84ef2ba1d2e2500affa80b72ee38d98	139.4 kB	Preview Download

Additional details

Is cited by: arXiv:1606.06543 (arXiv)
Is supplemented by: https://github.com/dice-project/DICE-Configuration-BO4CO (URL); http://www.slideshare.net/pooyanjamshidi/transfer-learning-for-optimal-configuration-of-big-data-software (URL)

European Commission
DICE - Developing Data-Intensive Cloud Applications with Iterative Quality Enhancements 644869

	All versions	This version
Views	744	744
Downloads	58	58
Data volume	8.1 MB	8.1 MB

An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing Systems

Files

bo4co_dataset.zip

Files (139.4 kB)

Additional details

Related works

Funding

An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing Systems

Creators

Description

Files

bo4co_dataset.zip

Files (139.4 kB)

Additional details

Related works

Funding