Impact of Methodological Choices on the Analysis of Code Metrics and Maintenance
Contributors
Supervisors:
Description
The repo-data folder contains 53 .json files, each corresponding to one of the 53 Java open-source projects. Each file contains various metrics for methods in the project.
{
"hawtio-3976.json": {
"Age": 794,
"sloc": [11,11,11],
"slocAsItIs": [11,14,14],
"slocNoCommentPretty": [11,11,11],
"diffSizes": [0,7,0 ],
"bodychanges": [0,1,0],
"newAdditions": [0,5,0],
"isGetter": [false,false,false],
"isSetter": [false,false,false],
"changeDates": [0,3,794],
"isEssentialChange": [false,true,false],
"isBuggy": [false,false,false],
"changeTypes": ["Yintroduced","Ybodychange","Yfilerename"],
"filename": "hawtio-3976.json",
"authors": ["X","Y","Z"],
"editDistance": [0, 68, 0],
"repo": "hawtio"
},
"method_id": {...},
"method_id": {...}
}
The above method with id hawtio-3976.json has total 3 revisions which is why the array of values for a particular metric (e.g., sloc: [11,11,11]) are of length 3. Index 0 of the array represents the introduction value of a particular metric for the above method.
Description of the metrics
Age: Age of the method in dayssloc: Source line of code of a method without comment and blank linesslocAsItIs: Source line of code of a method with comment and blank linesslocNoCommentPretty: Source line of code pretty printed without comment and blank linesdiffSizes: Total number of lines added + removed in gitdiffbodychanges: Contains value 0 or 1; where 1 implies occurrence of body changenewAdditions: Total number of lines added in gitdiffisGetter: Containstrueorfalse; wheretrueindicates it is agetmethodisSetter: Containstrueorfalse; wheretrueindicates it is asetmethodchangeDates: Contains the date difference in days from when the method was introduced. Index0is always 0 which indicates the introduction dateisEssentialChange: Containstrueorfalse; wheretrueindicates it is an essential change. Essential change includes:Ybodychange,Ymodifierchange,Yexceptionschange,Yrename,Yparameterchange,YreturntypechangeandYparametermetachangedetected byCodeShovelisBuggy: Containstrueorfalse; wheretrueindicates the method bug was fixed at a particular revisionchangeTypes: All transformations applied to the method at each revision. The full list of transformation that is detected byCodeShovelare:Ybodychange,Ymodifierchange,Yexceptionschange,Yrename,Yparameterchange,Yreturntypechange,Yparametermetachange,Yannotationchange,Ydocchange,Yformatchange,YfilerenameandYmovefromfilefilename: It is the methodidbugData folder contains 53 .json files with bug information, each belonging to one of the 53 Java open-source projects. The sample JSON schema of a file is given below:
{ "hawtio-3976.json":{ "exactBug0Match": [false, false, false], "exactBug1Match": [false, false, false], "exactBug2Match": [false, false, false], "exactBug3Match": [false, false, false], "regExBug0": [false, false, false], "regExBug1": [false, false, false], "regExBug2": [false, false, false], "regExBug3": [false, false, false] }, "method_id": {...}, "method_id": {...}, }The above method can be mapped to its metrics dataset using the method_id. For e.g., the above method with id hawtio-3976.json in bugData/hawtio.jsonthat has 3 revision can be found in the metric dataset using the same id hawtio-3976.json in the file repo-data/hawtio.json.
Description of bug datasetEach key in the above example contains value true or false indicating if a method was buggy or not at each revision. The "hawtio-3976.json method has 3 revisions (including method's introduction) which is why the array length is 3. The keys in the above json output represent bug-fix classification based on buggy keywords adopted from prior work.
We identified bug-fix commit using two approaches:
Exact case insensitive match of buggy keywords from the commit message (keys prefix wih exact represent this)
Partial case insensitive substring match (using regular expression) excluding words that ends with fix or bug. (keys prefix with regEx represent this)Bug0: This is the approach that we have used for classifying bug-fix commit. Buggy keyword list: ["error", "bug", "fixes", "fixing", "fix", "fixed", "mistake", "incorrect", "fault", "defect", "flaw"]
Bug1: Same keyword list as exactBug0Match with the addition of keyword issues
Bug2: Buggy keyword list from prior work: ["bug", "fix", "error", "issue", "crash", "problem", "fail", "defect", "patch"]
Bug3: Buggy keyword list from prior work: ["error", "bug", "fix", "issue", "mistake", "incorrect", "fault", "defect", "flaw", "type"]
Files
dataset.zip
Files
(108.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:80e105dadc039129dd76c3a49636c855
|
108.2 MB | Preview Download |