Published November 10, 2025 | Version v1
Dataset Open

HackRep: A Large-Scale Dataset of GitHub Hackathon Projects

  • 1. ROR icon Eindhoven University of Technology
  • 2. ROR icon Carnegie Mellon University

Description

Hackathons are time-bound collaborative events that often target software creation. Although hackathons have been studied in the past, these studies have been limited to in-depth studies of few events, limiting understanding of hackathons as a software engineering activity.

To complement the existing body of knowledge, we introduce HackRep, a dataset of 100,356 hackathon GitHub repositories. We illustrate the ways HackRep can benefit software engineering researchers by presenting a preliminary investigation of hackathon continuation, composition of hackathon teams, and the ability to estimate the geographical location of hackathons. In these investigations, we display the opportunities made possible with this dataset, for instance showing the possibility of estimating hackathon durations based on commit timestamps.

Files

Scripts.zip

Files (2.5 GB)

Name Size Download all
md5:9d9acd77e05b82c57d6a1053e75254fd
2.5 GB Download
md5:470cb811137078a0da4c4deeb6c11493
24.5 kB Preview Download