Dataset Open Access

Software Developer Expertise GitHub and Stack Overflow data sets

Norbert Eke; Olga Baysal

Thesis supervisor(s)

Olga Baysal

Cross-Platform Software Developer Expertise Learning by Norbert Eke

This data set is part of my Master's thesis project on developer expertise learning by mining Stack Overflow (SOTorrent) and Github (GHTorrent) data. Check out my portfolio website at norberte.github.io

Files (6.8 GB)
Name Size
bestLDA_GH_full_MDS_topic16_beta0.05.html
md5:5684d71aecde6b47903e1012cc0d3574
413.4 kB Download
bestLDA_GH_full_TSNE_topic16_beta0.05.html
md5:443aa1952181c38a0facb957c3677732
413.5 kB Download
bestLDA_SO_full_MDS_topic33_beta1.html
md5:5250d80cd0a788bb47b7bd23908f5546
2.0 MB Download
bestLDA_SO_full_TSNE_topic33_beta1.html
md5:a6ee8128488a9e217d5fd2929b836061
2.0 MB Download
GH_annotations_processed.csv
md5:a55ab897a02d0f6b067e2d1846f64d25
72.3 kB Download
GH_annotations_raw.csv
md5:15b95df1b46a2f19c97c435ed4f8fb29
40.7 kB Download
GH_full.csv
md5:6ec5f55a520a8b2079fe6926245689ec
1.2 GB Download
GH_full.sql
md5:27796b11c3068baec55d436a81cb228e
1.2 GB Download
GH_past.csv
md5:efecbed0ee91de52bb45cded93cabaf9
591.1 MB Download
GH_past_SO_past Most Common keywords.txt
md5:54ef672adead3f1b7fc01ee4946efeaf
790 Bytes Download
GH_recent.csv
md5:4fd864c2c7404139feb298ead44d1250
661.9 MB Download
GH_recent_active_users.csv
md5:e74ed9467cec86e02c8f3dcf020eb13c
891.3 kB Download
GH_recent_SO_recent_Most Common keywords.txt
md5:536f58545c5b546354f1b16c73969070
779 Bytes Download
GH_survey_setup.txt
md5:7ba3bd09eaa4a6c9cbf6d60f1d8c877e
5.9 kB Download
RQ2_3_GH_SO_past_word_regr_data.csv
md5:dcfe611317c8bafc4f2f3129080e7dbe
13.1 MB Download
RQ2_3_GH_SO_recent_word_regr_data.csv
md5:0272e7edefdaa8e1015930d960b63d6d
11.2 MB Download
RQ4_GH_past_to_GH_recent.csv
md5:daab386f4687f2d9aa858f964812c953
10.1 MB Download
RQ4_SO_past_to_SO_recent.csv
md5:3e9f9efe9a492ca7e65d9a390588edfe
8.8 MB Download
SO_annotations_processed.csv
md5:b39914118aaaf183ce23a63d1f0a86a5
73.0 kB Download
SO_annotations_raw.csv
md5:67ed398a0bf8efc6739ad47a51e38382
41.8 kB Download
SO_full.csv
md5:c3b746d170471cee172e719f1f95f77d
112.3 MB Download
SO_full.sql
md5:d62ab51a59ddf1e8ec72aba93ccdec4b
112.4 MB Download
SO_past.csv
md5:a4e01df0b69b0c1df324087be8d97e55
91.9 MB Download
SO_pre-trained_vectors.kv
md5:d9bda57d2ad46151526de3790470eff9
106.1 MB Download
SO_pre-trained_vectors.kv.vectors.npy
md5:40fae295205ba250cc61d556c0de4083
1.4 GB Download
SO_recent.csv
md5:20431d032594999e9d673c9e8f289d50
18.9 MB Download
SO_recent_active_users.csv
md5:8406f1d62d367988f9b5e739c46bdef7
37.6 kB Download
SO_survey_setup.txt
md5:4205d4c44975ec60a1f75fde20c595ba
8.1 kB Download
text_corpus_GH_full_activity.csv
md5:25a099c0b3ada5cc7ea7c411cdc24ba4
579.6 MB Download
text_corpus_GH_past_activity.csv
md5:c0b143ba038a37489b26249409f49c14
237.0 MB Download
text_corpus_SO_full_activity.csv
md5:ab424e95da226bf18fcc3ab079fa9fbb
45.1 MB Download
text_corpus_SO_past_activity.csv
md5:43ddcfb43fcb60deebc28956b7739735
56.1 MB Download
text_corpus_SO_recent_activity.csv
md5:0a66983612a3cac3068924bc53eb3782
11.1 MB Download
texts_corpus_GH_recent_activity.csv
md5:d96cc8ccfdca82d8c098d89657b16dd1
258.2 MB Download
Topic_Labeling.xlsx
md5:80584662697fc7779b1884d36c63a9a0
12.4 kB Download
  • Gousios, Georgios. "The GHTorent dataset and tool suite." 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 2013.

  • Baltes, Sebastian, et al. "Sotorrent: Reconstructing and analyzing the evolution of stack overflow posts." Proceedings of the 15th international conference on mining software repositories. 2018.

  • Vasilescu, Bogdan, Vladimir Filkov, and Alexander Serebrenik. "Stackoverflow and github: Associations between software development and crowdsourced knowledge." 2013 International Conference on Social Computing. IEEE, 2013.

1,269
569
views
downloads
All versions This version
Views 1,2691,269
Downloads 569569
Data volume 28.2 GB28.2 GB
Unique views 1,0961,096
Unique downloads 523523

Share

Cite as