Published April 4, 2026 | Version 1.0.0
Dataset Open

Itemlet Dataset: A Multi-project, Multi-domain, and Feature-engineered Dataset for Empirical Software Engineering

  • 1. Wrocław University of Science and Technology

Description

Itemlet dataset represents a very large scale and a number of different projects of Jira issues (total 727282 items). This dataset was created by using 204 open-source projects that include 19 different areas or domains such as healthcare; finance; developer tools and e-commerce etc. Every row includes 108 variables. 60 of these were extracted from the Jira REST API based on an entire life cycle of an issue (sprint metadata), users involved with the issue and effort applied to resolve. The remaining 48 variables were generated through pre-computation techniques. These techniques have encoded collaboration dynamics; effort risk; temporal patterns and business value signals. Three formally defined predictive problems can be supported by this dataset. These are - effort estimation; issue prioritization and complexity classification. For each of these predictive problems, there is a ground truth that has been generated directly from one or more of the variables within the dataset. There are four concurrent efforts that have been identified in the dataset. These are - story point effort; cycle time effort; total time logged effort and completion time effort. Sprint metadata for 267203 issues is also included. In addition, there are domain labels that have been assigned to 19 different categories of data. All 108 variables in the dataset are included in the accompanying data dictionary. A dimension weighted average score of 94.75% has been achieved on all 15 fair criteria. The Itemlet dataset is made publicly available under cc-by 4.0 license along with the supplementary materials, a project summary file, a file containing the domain classifications for the data and a fixed requirements file.

Files

itemlet_dataset.csv

Files (546.4 MB)

Name Size Download all
md5:8f7905125bb240eee3f4d5d8aeb505b1
2.4 kB Download
md5:e62d3884e7deca03a106b86469ea9d48
16.0 kB Preview Download
md5:8c6502de0ab9aa41c5410f75a1085709
1.7 kB Preview Download
md5:6caa3c6f10eb9cf629f3f8e7014df772
49.0 kB Preview Download
md5:ea0426869723cd4792b37e928019bb81
438.5 MB Preview Download
md5:e80d4f6b521d5ace92c039fb49325480
107.9 MB Download
md5:3a49d0835731aab84f18ceca108ad0ae
16.4 kB Preview Download
md5:f8f4cf3ffd64c8cd663db3c0d8a07f05
18.8 kB Preview Download
md5:3788c7bd3b0b430fdb30627e863db640
322 Bytes Preview Download
md5:891de9fe9c9ddc0a1890ba4a4dcb8888
9.3 kB Preview Download