GitHub data privacy commits from JSS 2025
Description
Dataset on commits (and repositories) on GitHub making reference to data privacy legislation (covering laws: GDPR, CCPA, CPRA, UK DPA).
The dataset contains:
+ all_commits_info_merged-v2-SHA.csv : commits information as collected from various GitHub REST API calls (all data merged together).
+ repos_info_merged_USED-v2_with_loc.csv: repository information with some calculated data.
+ top-70-repos-commits-for-manual-check_commits-2coders.xlsx: results of the manual coding of the commits of the 70 most popular repositories in dataset.
+ user-rights-ω3.csv: different terms for user rights teriminology in legislation.
+ github_commits_analysis_replication.r: main analysis pipeline covering all RQs in the R programming language.
In order to perform also the initial data collection, the GitHub REST API can be used, collecting data using time intervals, for instance:
https://api.github.com/search/commits?q=%22GDPR%22+committer-date:2018-05-25..2018-05-30&sort=committer-date&order=asc&per_page=100&page=1
This dataset accompanies the following publication, so please cite it accordingly:
Georgia M. Kapitsaki, Maria Papoutsoglou, Evolution of repositories and privacy laws: commit activities in the GDPR and CCPA era, accepted for publication at Elsevier Journal of Systems & Software, 2025.
Files
Additional details
Related works
- Continues
- Conference proceeding: 10.1145/3639478.3643109 (DOI)