Published 2025 | Version v2
Dataset Open

The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing

Description

The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing

This repository accompanies the paper:
    Arno Uhlig, Iris Braun, Matthias Wählisch
    The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing.* 
    Proceedings of the ACM Internet Measurement Conference (IMC ’25), October 28–31, 2025, Madison, WI, USA.
    DOI: 10.1145/3730567.3764480

If you use this dataset, please cite our paper and dataset:

@inproceedings{uhlig2025sapdataset,
  author = {Arno Uhlig, Iris Braun, Matthias W\"ahlisch},
  title = {The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing},
  booktitle = {Proceedings of the 2025 ACM Internet Measurement Conference (IMC '25)},
  year = {2025},
  doi = {10.1145/3730567.3764480}
}

@dataset{zenodo17141306,
  author = {Arno Uhlig and Iris Braun and Matthias W\"ahlisch},
  title = {The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing},
  year = {2025},
  publisher = {Zenodo},
  doi = {10.5281/zenodo.17141306},
  url = {https://doi.org/10.5281/zenodo.17141306}
}

If you have questions, please contact:
- Arno Uhlig – arno.uhlig@sap.com
- Iris Braun – iris.braun@tu-dresden.de
- Matthias Wählisch – m.waehlisch@tu-dresden.de

---

Overview

This repository provides data and artifacts used in the paper, including:

- Telemetry data from ~1,800 hypervisors and 48,000 VMs over a 30-day observation period
- Resource utilization metrics (CPU, memory, network, storage)
- Scheduling-relevant events (creation, migration, resize, deletion)
- Scripts for preprocessing, analysis, and visualization

The dataset captures real-world enterprise workloads and enables reproducible research on VM placement and scheduling in large-scale cloud environments.

---

Repository Structure

.
├── data/ # Raw, anonymized datasets
├── scripts/ # Analysis and visualization scripts
└── LICENSE # License file


Getting Started - Requirements

- Python ≥ 3.10
- See the [requirements.txt](./requirements.txt)
- For large-scale analysis: sufficient memory and storage

License

This dataset and accompanying material are released under the **Creative Commons Attribution 4.0 International License (CC BY 4.0)**.
See [LICENSE](./LICENSE) for details.

Files

sap-cloud-infrastructure-dataset-1.0.1.zip

Files (1.0 GB)

Name Size Download all
md5:f2e34d5f060d9d825d475d7b320d4772
1.0 GB Preview Download

Additional details

Related works

Is supplement to
Conference paper: 10.1145/3730567.3764480 (DOI)

Funding

Federal Ministry for Economic Affairs and Climate Action
European Union
European Union - NextGenerationEU 13IPC007

Software

Programming language
Python