Published June 23, 2025 | Version v7
Dataset Open

The Public Jira Dataset

  • 1. University of Hamburg

Description

Jira is an issue tracking system that supports software companies (among other types of companies) with managing their projects, community, and processes. This dataset is a collection of public Jira repositories downloaded from the internet using the Jira API V2. We collected data from 16 public Jira repositories containing 1822 projects and 2.7 million issues. Included in this data are historical records of 32 million changes, 9 million comments, and 1 million issue links that connect the issues in complex ways. This artefact repository contains the data as a MongoDB dump, the scripts used to download the data, the scripts used to interpret the data, and qualitative work conducted to make the data more approachable.

Note: This data has been anonymised to remove all personal information stored inside the user records within each object. This includes information about the assignee, creator, and reporter for each issue, as well as the identifying information contained with comment authors and evolution authors. Using UUID4 masks, uniqueness has been maintained while removing all personal identifying information.

Please cite this work as:

Montgomery L, Lüders C, Maalej W. An Alternative Issue Tracking Dataset of Public Jira Repositories. In 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR) 2022 May 23 (pp. 73-77). IEEE.

Files

2025-06-23 ThePublicJiraDataset.zip

Files (5.8 GB)

Name Size Download all
md5:02f85309d966092ea130ca0797aea795
5.8 GB Preview Download