Dataset of Commit Classification via Diff-Code GCN based on System Dependency Graph
Description
Commit Classification via Diff-Code GCN based on System Dependency Graph
The dataset is based on Lobna Ghadhab et al. [1]. Levin et al.[2]'s dataset, and we extract all commits with pure java codes of two versions.
In the dataset, evert commit folder have two sub-folder called before and after, they contains two version of codes. we extracted it by pydriller.
The dataset have 1213 commits with two version java codes,and it contains three categories:
(1) The first category is Corrective, which involves rectifying errors and faults identified during software usage.
(2)The second category is Perfective, which entails enhancing software quality attributes, such as performance, maintainability, and usability.
(3) Lastly, is Adaptive, which encompasses adapting the software to new environments (e.g., software or hardware) or introducing new functionalities.
The dataset have 450 labels of Corrective. 441 for Perfective the rest for Adaptive.
[1]L. Ghadhab, I. Jenhani, M. W. Mkaouer, and M.Ben Messaoud, ”Augmenting commit classification by using fine-grained source code changes and a pretrained deep neural language model,” Information and Software Technology, vol. 135, p. 106566, 2021/07/01/2021.
[2]S. Levin and A. Yehudai, ”Using Temporal and Semantic Developer-Level Information to Predict Main
tenance Activity Profiles,” in 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2016.
Files
CommitDataset.csv
Files
(54.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:46821275eb98026e905f857c5d6936d9
|
481.8 kB | Preview Download |
|
md5:75b594f5ca629111f77eed6dc2f13187
|
54.0 MB | Preview Download |