Dataset Open Access

Mining the Technical Roles of GitHub Users

João Eduardo; Luciana L.; Marco Tulio

This dataset contains the scripts and dataset used in the study reported at Mining the Technical Roles of GitHub Users paper. The files are described in more detailed below:

  • processed_ground_truth.csv: A CSV file with the information of the developers considered in the study. Due to privacy issues, we already preprocessed the dataset to remove identification clues. Please contact the authors in case you need the original one.
  • processed_ground_truth_fullstack.csv: Same CSV file but with fullstack developers.
  • script.ipynb, utils.py: Source code of the script used in our study.
  • Dockerfile, docker-compose.yml, requirements.txt: Files to replicate the code environment used in this study.
  • BoW-tuning.csv: List of classifications results for different bag of words parameters.
Files (32.8 MB)
Name Size
BoW-tuning.csv
md5:129d88996d8db01ec8fae56f9c5e2771
7.4 kB Download
docker-compose.yml
md5:6e99b1c4dd52adc0197d1d6006db2890
341 Bytes Download
Dockerfile
md5:37a8b04ae13f8a985416a0cb155c345b
349 Bytes Download
processed_ground_truth.csv
md5:18334c98e1ec6aac84068717371889cb
13.6 MB Download
processed_ground_truth_fullstack.csv
md5:a81c3873fc5a778a9f493da94446b5db
19.1 MB Download
requirements.txt
md5:9321b0ae73d2ef267541b8f720696c50
1.5 kB Download
script.ipynb
md5:43fd542071544ccc1e647f8126a56311
23.2 kB Download
utils.py
md5:ec7035aef2364ff9cf9d8f476082d49a
16.1 kB Download
528
677
views
downloads
All versions This version
Views 528308
Downloads 677516
Data volume 2.0 GB1.6 GB
Unique views 422271
Unique downloads 336254

Share

Cite as