A Dataset of bots and human activities in NumFOCUS
Description
This repository provides a dataset of public GitHub events performed by 24,400 contributors and higher-level activities performed by the 853 most active contributors (34 bot accounts, 13 GitHub Apps, 4 built-in GitHub services and 802 human accounts) that were active in GitHub repositories belonging to NumFOCUS organisations during July, August and September 2024. This dataset is used for an empirical study in the paper titled Observing bots in the wild: A quantitative analysis of a large open source ecosystem published at the 6th International Workshop on Bots in Software Engineering (BotSE) 2025. DOI: https://www.doi.org/10.1109/BotSE67031.2025.00008. This research paper is co-authored by Natarajan Chidambaram and Tom Mens (Software Engineering Lab, University of Mons, Belgium). This work is supported by Service Public de Wallonie Recherche under grant number 2010235 - ARIAC by DigitalWallonia4.AI, and by the Fonds de la Recherche Scientifique – FNRS under grant numbers J.0147.24 and T.0149.22.
Files Description:
RawEvents.zip: Contains the raw events that were performed by contributors in GitHub repositories belonging to NumFOCUS organisations
activities.csv: A CSV file containing the activities of all the contributors
basic_features.csv: A CSV file containing the basic features (contributor type, number of activities performed, number of repositories contributed to, number of organisations involved with) for all contributors
activities_per_activity_type.csv: A CSV file containing number of activities for each activity type that contributors performed in GitHub repositories belonging to NumFOCUS organisations