Published August 7, 2023 | Version v1
Dataset Open

Dataset of Commit Classification via Diff-Code GCN based on System Dependency Graph

Authors/Creators

  • 1. Southeast University

Description

Commit Classification via Diff-Code GCN based on System Dependency Graph

The dataset is based on Lobna Ghadhab et al. [1]. Levin et al.[2]'s dataset, and we extract all commits with pure java codes of two versions. 

In the dataset, evert commit folder have two sub-folder called before and after, they contains two version of codes. we extracted it by pydriller.

The dataset have 1213 commits with two version java codes,and it contains three categories:

(1) The first category is Corrective, which involves rectifying errors and faults identified during software usage.

(2)The second category is Perfective, which entails enhancing software quality attributes, such as performance, maintainability, and usability.

(3) Lastly, is Adaptive, which encompasses adapting the software to new environments (e.g., software or hardware) or introducing new functionalities.

The dataset have 450 labels of Corrective. 441 for Perfective the rest for Adaptive.

 

[1]L. Ghadhab, I. Jenhani, M. W. Mkaouer, and M.Ben Messaoud, ”Augmenting commit classification by using fine-grained source code changes and a pretrained deep neural language model,” Information and Software Technology, vol. 135, p. 106566, 2021/07/01/2021.

[2]S. Levin and A. Yehudai, ”Using Temporal and Semantic Developer-Level Information to Predict Main

tenance Activity Profiles,” in 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2016.

 

Files

CommitDataset.csv

Files (54.5 MB)

Name Size Download all
md5:46821275eb98026e905f857c5d6936d9
481.8 kB Preview Download
md5:75b594f5ca629111f77eed6dc2f13187
54.0 MB Preview Download