Published May 28, 2025 | Version 1.0
Dataset Restricted

GitHub data privacy commits from JSS 2025

  • 1. ROR icon University of Cyprus

Description

Dataset on commits (and repositories) on GitHub making reference to data privacy legislation (covering laws: GDPR, CCPA, CPRA, UK DPA).
 
The dataset contains:
+ all_commits_info_merged-v2-SHA.csv : commits information as collected from various GitHub REST API calls (all data merged together).
+ repos_info_merged_USED-v2_with_loc.csv: repository information with some calculated data.
+ top-70-repos-commits-for-manual-check_commits-2coders.xlsx: results of the manual coding of the commits of the 70 most popular repositories in dataset.
+ user-rights-ω3.csv: different terms for user rights teriminology in legislation.
+ github_commits_analysis_replication.r: main analysis pipeline covering all RQs in the R programming language.

In order to perform also the initial data collection, the GitHub REST API can be used, collecting data using time intervals, for instance:
https://api.github.com/search/commits?q=%22GDPR%22+committer-date:2018-05-25..2018-05-30&sort=committer-date&order=asc&per_page=100&page=1

This dataset accompanies the following publication, so please cite it accordingly:

Georgia M. Kapitsaki, Maria Papoutsoglou, Evolution of repositories and privacy laws: commit activities in the GDPR and CCPA era, accepted for publication at Elsevier Journal of Systems & Software, 2025.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Related works

Continues
Conference proceeding: 10.1145/3639478.3643109 (DOI)

Software

Programming language
R