Dataset of Commit Classification via Diff-Code GCN based on System Dependency Graph

Zaixing

doi:10.5281/zenodo.8220024

Published August 7, 2023 | Version v1

Dataset Open

Dataset of Commit Classification via Diff-Code GCN based on System Dependency Graph

Zaixing¹

1. Southeast University

Commit Classification via Diff-Code GCN based on System Dependency Graph

The dataset is based on Lobna Ghadhab et al. [1]. Levin et al.[2]'s dataset, and we extract all commits with pure java codes of two versions.

In the dataset, evert commit folder have two sub-folder called before and after, they contains two version of codes. we extracted it by pydriller.

The dataset have 1213 commits with two version java codes,and it contains three categories:

(1) The first category is Corrective, which involves rectifying errors and faults identified during software usage.

(2)The second category is Perfective, which entails enhancing software quality attributes, such as performance, maintainability, and usability.

(3) Lastly, is Adaptive, which encompasses adapting the software to new environments (e.g., software or hardware) or introducing new functionalities.

The dataset have 450 labels of Corrective. 441 for Perfective the rest for Adaptive.

[1]L. Ghadhab, I. Jenhani, M. W. Mkaouer, and M.Ben Messaoud, ”Augmenting commit classification by using fine-grained source code changes and a pretrained deep neural language model,” Information and Software Technology, vol. 135, p. 106566, 2021/07/01/2021.

[2]S. Levin and A. Yehudai, ”Using Temporal and Semantic Developer-Level Information to Predict Main

tenance Activity Profiles,” in 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2016.

Files

CommitDataset.csv

Files (54.5 MB)

Name	Size	Download all
CommitDataset.csv md5:46821275eb98026e905f857c5d6936d9	481.8 kB	Preview Download
data.zip md5:75b594f5ca629111f77eed6dc2f13187	54.0 MB	Preview Download

	All versions	This version
Views	346	346
Downloads	126	126
Data volume	2.3 GB	2.3 GB

Dataset of Commit Classification via Diff-Code GCN based on System Dependency Graph

Authors/Creators

Description

Files

CommitDataset.csv

Files (54.5 MB)