A dataset of Bot Commits
Creators
- 1. University of Tennessee, Knoxville
- 2. Carnegie Mellon University
- 3. Github
Description
This dataset contains information about 13,762,430 commits by 461 bots, each of whom have created more than 1000 commits, that have committed code in Git.
The data is stored in a gzipped csv file (";" as the separator) with the following format in each line:
"author_id"; "commit-sha"; "time-of-the-commit"; "timezone"; "files-modified-by-the-commit"; "projects-the-commit-is-associated-with"; "commit-message". In the case of having multiple projects and/or files for a given commit, they are separated by ','.
These bots were detected using the BIMAN bot detection approach using the World of Code(http://worldofcode.org/) dataset.
For details of the approach, see the corresponding paper in MSR 2020.
If you're using this data for your research, please don't forget to cite it!!!
Files
Files
(2.3 GB)
Name | Size | Download all |
---|---|---|
md5:ed6f50bc68a9246001ce4d5551f805e5
|
2.3 GB | Download |