Published October 13, 2022 | Version v1
Dataset Open

Dataset of Automatically Orchestrable GitHub Projects

Description

This dataset accompanies the submission "Generating representative, live network traffic out of millions of code repositories" at HotNets'22: The 21st ACM Workshop on Hot Topics in Networks.

Please see the files:
- `list_of_github_repositories.txt` for a list of GitHub repositories that we found containing a `docker-compose*.yml` file
- `list_of_executed_repositories.csv` for more detailed information on the success of capturing traffic with specific orchestration files found in ~67% of the repositories

If you use our dataset, please cite our work as follows:

Tobias Bühler, Roland Schmid, Sandro Lutz, and Laurent Vanbever.
2022. Generating representative, live network traffic out of millions
of code repositories. In The 21st ACM Workshop on Hot Topics
in Networks (HotNets ’22), November 14–15, 2022, Austin, TX,
USA. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/
3563766.3564084

Files

list_of_executed_repositories.csv

Files (227.7 MB)

Name Size Download all
md5:4383ffa014ed6a17314482705bfcbb1a
216.4 MB Preview Download
md5:33a6c0ef0e6b139caf679b69687916c7
11.4 MB Preview Download

Additional details

Related works

Is supplement to
Conference paper: 10.1145/3563766.3564084 (DOI)

Funding

SyNET – From Network Verification to Synthesis: Breaking New Ground in Network Automation 851809
European Commission