Dataset of Automatically Orchestrable GitHub Projects
Description
This dataset accompanies the submission "Generating representative, live network traffic out of millions of code repositories" at HotNets'22: The 21st ACM Workshop on Hot Topics in Networks.
Please see the files:
- `list_of_github_repositories.txt` for a list of GitHub repositories that we found containing a `docker-compose*.yml` file
- `list_of_executed_repositories.csv` for more detailed information on the success of capturing traffic with specific orchestration files found in ~67% of the repositories
If you use our dataset, please cite our work as follows:
Tobias Bühler, Roland Schmid, Sandro Lutz, and Laurent Vanbever.
2022. Generating representative, live network traffic out of millions
of code repositories. In The 21st ACM Workshop on Hot Topics
in Networks (HotNets ’22), November 14–15, 2022, Austin, TX,
USA. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/
3563766.3564084
Files
list_of_executed_repositories.csv
Files
(227.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:4383ffa014ed6a17314482705bfcbb1a
|
216.4 MB | Preview Download |
|
md5:33a6c0ef0e6b139caf679b69687916c7
|
11.4 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Conference paper: 10.1145/3563766.3564084 (DOI)