Published March 6, 2018 | Version v1
Dataset Open

Data set for the paper What are the Effects of History Length and Age on Mining Software Change Impact?

  • 1. Simula
  • 2. Simula Research Laboratory, Norway
  • 3. Loyola University Maryland, USA

Description

Data set for the paper What are the Effects of History Length and Age on Mining Software Change Impact?
by Leon Moonen, Thomas Rolfsnes, David Binkley and Stefano di Alesio.
In Journal of Empirical Software Engineering (EMSE), 2018, Springer. https://doi.org/10.1007/s10664-017-9588-z
Available from https://evolveit.bitbucket.io/publications/emse2018/

Please cite this work by referring to the corresponding journal publication (a preprint is included in this package).

The goal of Software Change Impact Analysis is to identify artifacts (typically source-code files or individual methods therein) potentially affected by a change. Recently, there has been increased interest in mining software change impact based on evolutionary coupling. A particularly promising approach uses association rule mining to uncover potentially affected artifacts from patterns in the system’s change history. Two main considerations when using this approach are the history length, the number of transactions from the change history used to identify the impact of a change, and history age, the number of transactions that have occurred since patterns were last mined from the history. Although history length and age can significantly affect the quality of mining results, few guidelines exist on how to best select appropriate values for these two parameters.

In this paper, we empirically investigate the effects of history length and age on the quality of change impact analysis using mined evolutionary coupling. Specifically, we report on a series of systematic experiments using three state-of-the-art mining algorithms that involve the change histories of two large industrial systems and 17 large open source systems. In these experiments, we vary the length and age of the history used to mine software change impact, and assess how this affects precision and applicability. Results from the study are used to derive practical guidelines for choosing history length and age when applying association rule mining to conduct software change impact analysis. 

Notes

This work is supported by the Research Council of Norway through the EvolveIT project (#221751/F20) and the Certus SFI (#203461/030). Dr. Binkley is supported by NSF grant IIA-1360707 and a J. William Fulbright award.

Files

data_for_what_are_the_effects_of_history_length_and_age_on_mining_software_change_impact_emse2018.zip

Additional details

Related works

Is supplement to
10.1007/s10664-017-9588-z (DOI)