Published January 12, 2021 | Version 1.0.0
Dataset Open

Replication Package: Assessing time-based and range-based strategies for commit assignment to releases

Description

Abstract:

Release is a ubiquitous concept in software development, referring to grouping multiple independent changes into a deliverable piece of software. Mining releases can help developers understand the software evolution at coarse grain, identify which features were delivered or bugs were fixed, and pinpoint who contributed on a given release. A typical initial step of release mining consists of identifying which commits compose a given release. We could find two main strategies used in the literature to perform this task: time-based and range-based. Some release mining works recognize that those strategies are subject to misclassifications but do not quantify the impact of such a threat. This paper analyzed 13,419 releases and 1,414,997 commits from 100 relevant open source projects hosted at GitHub to assess both strategies in terms of precision and recall. We observed that, in general, the range-based strategy has superior results than the time-based strategy. Nevertheless, even when the range-based strategy is in place, some releases still show misclassifications. Thus, our paper also discusses some situations in which each strategy degrades, potentially leading to bias on the mining results if not adequately known and avoided.

Instructions:

Visit https://github.com/gems-uff/release-mining for instructions about how to use this dataset.

Files:

  • The repos.tgz contains our project corpus comprising 1,414,997 releases from 100 relevant open source projects.
  • The repos.sha1 contains the sha1 checksum of repos.tgz

Disclaimer:

This replication package contains the source code of 100 relevant open source projects. Its purpose is to enable the replication of the study conducted in the paper "Assessing time-based and range-based strategies for commit assignment to releases."  

It is essential to check each project license before using the source code or any attached file for any other purposes besides replicating the study.

 

Notes

Visit https://github.com/gems-uff/release-mining for instructions about how to use this dataset.

Files

Files (15.5 GB)

Name Size Download all
md5:a9300503ee8bec5c18aa8141ed385ab1
52 Bytes Download
md5:6d908675f6dc170d8a27a7bee35d611f
15.5 GB Download