Published January 16, 2020 | Version 1.0
Dataset Open

A dataset of Bot Commits

  • 1. University of Tennessee, Knoxville
  • 2. Carnegie Mellon University
  • 3. Github

Description

This dataset contains information about 13,762,430 commits by 461 bots, each of whom have created more than 1000 commits, that have committed code in Git.


The data is stored in a gzipped csv file (";" as the separator) with the following format in each line:

"author_id"; "commit-sha"; "time-of-the-commit"; "timezone"; "files-modified-by-the-commit"; "projects-the-commit-is-associated-with"; "commit-message". In the case of having multiple projects and/or files for a given commit, they are separated by ','.


These bots were detected using the BIMAN bot detection approach using the World of Code(http://worldofcode.org/) dataset.

For details of the approach, see the corresponding paper in MSR 2020. 


If you're using this data for your research, please don't forget to cite it!!!

Files

Files (2.3 GB)

Name Size Download all
md5:ed6f50bc68a9246001ce4d5551f805e5
2.3 GB Download