bugclassify

Kochhar, Pavneet Singh

doi:10.5281/zenodo.268529

Published March 15, 2015 | Version v1

Dataset Open

bugclassify

Kochhar, Pavneet Singh

About the Data

They download Herzig et al.’s datasets which included the identiers of issue reports that they have manually analyzed. The description of that dataset follows.

The authors conducted a study on five open-source JAVA projects described in Table I (see paper). They aimed to select projects that were under active development and were developed by teams that follow strict commit and bug fixing procedures similar to industry. They also aimed to have a more or less homogenous data set which eased the manual inspection phase. Projects from APACHE and MOZILLA seemed to fit their requirements best. Additionally, they selected the five projects such that they cover atleast two different and popular bug tracking systems: Bugzilla1 and Jira2. Three out of five projects (Lucene-Java, Jackrabbit,and HTTPClient) use a Jira bug tracker. The remaining two projects (Rhino, Tomcat5) use a Bugzilla tracker. For each of the five projects, they selected all issue reports that were marked as being RESOLVED , CLOSED, or VERIFIED and whose resolution was set to FIXED and performed a manual inspection on these issues. They disregarded issues with resolution in progress or not being accepted, as their features may change in the future.The number of inspected reports per project can be found in the table above. In total, they obtained 7,401 closed and fixed issue reports. 1,810 of these reports originate from the Rhino and Tomcat5 projects and represent Bugzilla issue reports. The remaining of the 5,591 reports were filed in a Jira bug tracker.

Abstract

Bug localization refers to the task of automatically process- ing bug reports to locate source code files that are respon- sible for the bugs. Many bug localization techniques have been proposed in the literature. These techniques are often evaluated on issue reports that are marked as bugs by their reporters in issue tracking systems. However, recent findings by Herzig et al. find that a substantial number of issue re- ports marked as bugs, are not bugs but other kinds of issues like refactorings, request for enhancement, documentation changes, test case creation, and so on. Herzig et al. report that these misclassifications affect bug prediction, namely the task of predicting which files are likely to be buggy in the future. In this work, we investigate whether these misclas- sifications also affect bug localization. To do so, we analyze issue reports that have been manually categorized by Herzig et al. and apply a bug localization technique to recover a ranked list of candidate buggy files for each issue report. We then evaluate whether the quality of ranked lists of reports reported as bugs is the same as that of real bug reports. Our findings shed light that there is a need for additional clean- ing steps to be performed on issue reports before they are used to evaluate bug localization techniques.

Files

httpclient_classification_vs_type.csv

Files (170.5 kB)

Name	Size	Download all
httpclient_classification_vs_type.csv md5:6c84192e56e61b3bc918c51b91573de7	21.2 kB	Preview Download
jackrabbit_classification_vs_type.csv md5:c007d74b73217385447436d8ce49fecd	54.0 kB	Preview Download
lucene_classification_vs_type.csv md5:96a6cc5b217d0c41eaf694b4b6d95e76	64.6 kB	Preview Download
README.txt md5:34f9ff45699b79cbdc01f983cc52623c	230 Bytes	Preview Download
rhino_classification_vs_type.csv md5:9ea69c3dee20f25393516f0f140caaff	10.1 kB	Preview Download
tomcat5_classification_vs_type.csv md5:1ed10d4415e74b5fae3249671461b505	20.4 kB	Preview Download

	All versions	This version
Views	404	402
Downloads	374	373
Data volume	13.4 MB	13.4 MB

bugclassify

Creators

Description

Files

httpclient_classification_vs_type.csv

Files (170.5 kB)