Published 2025
| Version v2
Dataset
Open
The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing
Authors/Creators
Description
The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing
This repository accompanies the paper:
Arno Uhlig, Iris Braun, Matthias Wählisch
This repository accompanies the paper:
Arno Uhlig, Iris Braun, Matthias Wählisch
The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing.*
Proceedings of the ACM Internet Measurement Conference (IMC ’25), October 28–31, 2025, Madison, WI, USA.
DOI: 10.1145/3730567.3764480
If you use this dataset, please cite our paper and dataset:
If you have questions, please contact:
- Arno Uhlig – arno.uhlig@sap.com
- Iris Braun – iris.braun@tu-dresden.de
- Matthias Wählisch – m.waehlisch@tu-dresden.de
---
Overview
This repository provides data and artifacts used in the paper, including:
- Telemetry data from ~1,800 hypervisors and 48,000 VMs over a 30-day observation period
- Resource utilization metrics (CPU, memory, network, storage)
- Scheduling-relevant events (creation, migration, resize, deletion)
- Scripts for preprocessing, analysis, and visualization
The dataset captures real-world enterprise workloads and enables reproducible research on VM placement and scheduling in large-scale cloud environments.
---
Repository Structure
Proceedings of the ACM Internet Measurement Conference (IMC ’25), October 28–31, 2025, Madison, WI, USA.
DOI: 10.1145/3730567.3764480
If you use this dataset, please cite our paper and dataset:
@inproceedings{uhlig2025sapdataset, author = {Arno Uhlig, Iris Braun, Matthias W\"ahlisch}, title = {The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing}, booktitle = {Proceedings of the 2025 ACM Internet Measurement Conference (IMC '25)}, year = {2025}, doi = {10.1145/3730567.3764480}}@dataset{zenodo17141306, author = {Arno Uhlig and Iris Braun and Matthias W\"ahlisch}, title = {The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing}, year = {2025}, publisher = {Zenodo}, doi = {10.5281/zenodo.17141306}, url = {https://doi.org/10.5281/zenodo.17141306}}If you have questions, please contact:
- Arno Uhlig – arno.uhlig@sap.com
- Iris Braun – iris.braun@tu-dresden.de
- Matthias Wählisch – m.waehlisch@tu-dresden.de
---
Overview
This repository provides data and artifacts used in the paper, including:
- Telemetry data from ~1,800 hypervisors and 48,000 VMs over a 30-day observation period
- Resource utilization metrics (CPU, memory, network, storage)
- Scheduling-relevant events (creation, migration, resize, deletion)
- Scripts for preprocessing, analysis, and visualization
The dataset captures real-world enterprise workloads and enables reproducible research on VM placement and scheduling in large-scale cloud environments.
---
Repository Structure
.├── data/ # Raw, anonymized datasets├── scripts/ # Analysis and visualization scripts└── LICENSE # License fileGetting Started - Requirements
- Python ≥ 3.10
- See the [requirements.txt](./requirements.txt)
- For large-scale analysis: sufficient memory and storage
License
This dataset and accompanying material are released under the **Creative Commons Attribution 4.0 International License (CC BY 4.0)**.
See [LICENSE](./LICENSE) for details.
Files
sap-cloud-infrastructure-dataset-1.0.1.zip
Files
(1.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:f2e34d5f060d9d825d475d7b320d4772
|
1.0 GB | Preview Download |
Additional details
Related works
- Is supplement to
- Conference paper: 10.1145/3730567.3764480 (DOI)
Funding
Software
- Programming language
- Python