Dataset Open Access

A Panel Data Set of Cryptocurrency Development Activity on GitHub

van Tonder, Rijnard; Trockman, Asher; Le Goues, Claire

Contents:

  • all-sorted-recovered-normalized-2018-01-21-to-2019-02-04.csv: CSV format of all data, sorted by date. This file contains some imputed values for missing data, and all fields across all repositories and normalized to "null". This is the most convenient form to use.
  • all-sorted-2018-01-21-to-2019-02-04.csv: CSV format of all, sorted by date. It is the raw data after processing the raw format.
  • raw-data-2018-01-21-to-2019-02-04.tar.gz: The raw format of data collected (S-expressions). Contains additional contributor data and CoinMarketCap data not currently in the CSV datasets.
  • recovered.patch: The modification on all-sorted-2018-01-21-to-2019-02-04.csv after recovering (imputing) datashowing what was recovered.
  • recovered-normalized.patch: The modification of all-sorted-2018-01-21-to-2019-02-04.csv after normalizing the recovered data set. Thus, patching all-sorted-2018-01-21-to-2019-02-04.csv with recovered.patch, then recovered-normalized.patch gives all-sorted-recovered-normalized-2018-01-21-to-2019-02-04.csv
  • missing-dates.txt: Days for which we missed GitHub data collection (partial or completely).

Related publications:

@inproceedings{van-tonder-crypto-oss-2019, 
  title = {{A Panel Data Set of Cryptocurrency Development Activity on GitHub}},
  booktitle = "International Conference on Mining Software Repositories",
  author = "{van~Tonder}, Rijnard and Trockman, Asher and {Le~Goues}, Claire",
  series = {MSR '19},
  year = 2019
} 

@inproceedings{trockman-striking-gold-2019, 
  title = {{Striking Gold in Software Repositories? An Econometric Study of Cryptocurrencies on GitHub}},
  booktitle = "International Conference on Mining Software Repositories", author = "Trockman, Asher and {van~Tonder}, Rijnard and Vasilescu, Bogdan",
  series = {MSR '19},
  year = 2019
}

Related code: https://github.com/rvantonder/CryptOSS

Files (6.3 GB)
Name Size
all-sorted-2018-01-21-to-2019-02-04.csv
md5:546004c45e2bad619f003c8954178e9c
204.2 MB Download
all-sorted-recovered-normalized-2018-01-21-to-2019-02-04.csv
md5:f47f9c115e42261a9bd44cbc489fba3a
378.9 MB Download
missing-dates.txt
md5:98b8f5edb0d0c6fe976b7da8b3e905a5
484 Bytes Download
raw-data-2018-01-21-to-2019-02-04.tar.gz
md5:42172152b23447682251d03ea5e9b8e9
5.4 GB Download
recovered-normalized.patch
md5:5e2cfb9e48f96cffedb538dfd4b98816
310.1 MB Download
recovered.patch
md5:4e6c58da1bf16a62b856f5395e81fb31
9.5 MB Download
343
479
views
downloads
All versions This version
Views 343343
Downloads 479479
Data volume 224.5 GB224.5 GB
Unique views 303303
Unique downloads 295295

Share

Cite as