Rafael - Michael Karampatsis
Charles Sutton
2020-02-07
<p>The ManySStuBs4J corpus is a collection of simple fixes to Java bugs, designed for evaluating program repair techniques.<br>
We collect all bug-fixing changes using the SZZ heuristic, and then filter these to obtain a data set of small bug fix changes.<br>
These are single statement fixes, classified where possible into one of 16 syntactic templates which we call SStuBs.<br>
The dataset contains simple statement bugs mined from open-source Java projects hosted in GitHub.<br>
There are two variants of the dataset. One mined from the 100 Java Maven Projects and one mined from the top 1000 Java Projects.<br>
A project's popularity is determined by computing the sum of z-scores of its forks and watchers.<br>
We kept only bug commits that contain only single statement changes and ignore stylistic differences such as spaces or empty as well as differences in comments.<br>
Some single statement changes can be caused by refactorings, like changing a variable name rather than bug fixes.<br>
We attempted to detect and exclude refactorings such as variable, function, and class renamings, function argument renamings or changing the number of arguments in a function.<br>
The commits are classified as bug fixes or not by checking if the commit message contains any of a set of predetermined keywords such as bug, fix, fault etc.<br>
We evaluated the accuracy of this method on a random sample of 100 commits that contained SStuBs from the smaller version of the dataset and found it to achieve a satisfactory 94% accuracy.<br>
This method has also been used before to extract bug datasets (Ray et al., 2015; Tufano et al., 2018) where it achieved an accuracy of 96% and 97.6% respectively.</p>
<p>The bugs are stored in a JSON file (each version of the dataset has each own instance of this file).<br>
Any bugs that fit one of 16 patterns are also annotated by which pattern(s) they fit in a separate JSON file (each version of the dataset has each own instance of this file).<br>
We refer to bugs that fit any of the 16 patterns as simple stupid bugs (SStuBs).</p>
<p>For more information on extracting the dataset and a detailed documentation of the software visit our GitHub repo: https://github.com/mast-group/SStuBs-mining</p>
https://doi.org/10.5281/zenodo.3653444
oai:zenodo.org:3653444
Zenodo
https://doi.org/10.7488/ds/2628
https://doi.org/10.5281/zenodo.3653443
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Program Repair
Fault Localization
Mutators
ManySStuBs4J Dataset
info:eu-repo/semantics/other