Data set for the paper "Predicting Relevance of Change Recommendations"

Rolfsnes, Thomas; Moonen, Leon; Binkley, David

doi:10.5281/zenodo.1040118

Published November 1, 2017 | Version v2

Dataset Open

Data set for the paper "Predicting Relevance of Change Recommendations"

1. Simula Research Laboratory, Norway
2. Loyola University Maryland, USA

Data set for the paper Predicting Relevance of Change Recommendations by Thomas Rolfsnes, Leon Moonen, and David Binkley, In International Conference on Automated Software Engineering (ASE), pp. 694–705. 2017, IEEE.

Please cite this work by referring to the corresponding conference publication (a preprint is included in this package).

Abstract: Software change recommendation seeks to suggest artifacts (e.g., files or methods) that are related to changes made by a developer, and thus identifies possible omissions or next steps. While one obvious challenge for recommender systems is to produce accurate recommendations, a complimentary challenge is to rank recommendations based on their relevance. In this paper, we address this challenge for recommendation systems that are based on evolutionary coupling. Such systems use targeted association-rule mining to identify relevant patterns in a software system's change history. Traditionally, this process involves ranking artifacts using interestingness measures such as confidence and support. However, these measures often fall short when used to assess recommendation relevance. We propose the use of random forest classification models to assess recommendation relevance. This approach improves on past use of various interestingness measures by learning from previous change recommendations. We empirically evaluate our approach on fourteen open source systems and two systems from our industry partners. Furthermore, we consider complimenting two mining algorithms: CO-CHANGE and TARMAQ. The results find that random forest classification significantly outperforms previous approaches, receives lower Brier scores, and has superior trade-off between precision and recall. The results are consistent across software system and mining algorithm.

Notes

This work is supported by the Research Council of Norway through the EvolveIT project (#221751/F20) and the Certus SFI (#203461/030). Dr. Binkley is supported by NSF grant IIA-1360707 and a J. William Fulbright award.

Files

data_for_predicting_relevance_of_change_recommendations_ase2017.zip

Files (1.7 GB)

Name	Size	Download all
data_for_predicting_relevance_of_change_recommendations_ase2017.zip md5:4231fc68d495fac740867d6d236888e4	1.7 GB	Preview Download

	All versions	This version
Views	891	702
Downloads	43	32
Data volume	79.8 GB	60.7 GB

Data set for the paper "Predicting Relevance of Change Recommendations"

Creators

Description

Notes

Files

data_for_predicting_relevance_of_change_recommendations_ase2017.zip

Files (1.7 GB)