Published July 19, 2019 | Version v2
Dataset Open

GROOT (chanGe pROneness Open daTaset)

  • 1. Federal University of Ceará

Description

GROOT (chanGe pROneness Open daTaset)

General Information

Data Set Characteristics: Multivariate

Associated Tasks: Classification

Number of Instances: 4183

Number of Attributes: 8

Missing Values? No

Area: Software Engineering

Source

Cristiano Sousa Melo

Matheus Mayron Lima da Cruz

Antônio Diogo Forte Martins

José Maria da Silva Monteiro Filho

Javam de Castro Machado

Data Set Information

  • The data set of this work is generated from the backend source code of a WEB application started in 2013, and until 2018 were collected 8 releases to analyze change-prone classes
  • The dependent variable "will change" has obtained according to [1]
  • The dependent variable "will change" is imbalanced, containing 3871 "not change" (0) labels and 312 "will change" (1) label

Attribute Information

instanceID: It is responsible for identifying each instance of the dataset uniquely.

classID: This column indicates the class from which the information of the row was extracted. It is a value obtained by hashing the name of the class.

releaseID: This column indicates the release from which the information of the  row was extracted.

CBO (Coupling Between Object) [2]: A class is coupled to another one if it used its member functions and/or instance variables. CBO provides the number of classes to which a given class is coupled. Range: 0 - 162

CC (Cyclomatic Complexity) [3]: It is a software metric used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program's source code. Range: 0 - 488.0

DIT (Depth of Inheritance Tree) [2]: It is defined as the maximum depth of the inheritance graph of each class. Range: 0 - 7

LCOM (Lack of Cohesion on Methods) [2]: This is the number of pairs of member functions without shared instance variables, minus the number of pairs of members functions with shared instance variables. Range: 0 - 1

LOC: Number of lines of codes. Imports and comments are not included. Range: 0 - 1369

NOC (Number Of Children) [2]: It is the number of direct descendants for each class. Range: 0 - 189

RFC (Response For a Class) [2]: This is the number of methods that can potentially be executed in response to a message received by an object of that class. Range: 0 - 413

WMC (Weighted Methods per Class) [2]: It is the number of methods of a class.Range: 0 - 56

class_frequency: It is the number of appearences that the class has through all the releases. Range: 0 - 8

number_of_changes: It is the number of changes that a class suffered through all the releases. Range: 0 - 7

will_change: It is the dependent variable, the indicator of a class change prone.

change_probability: It is the number of changes divided by the class frequency. Range: 0 - 1

References

[1] Lu, H., Zhou, Y., Xu, B., Leung, H., and Chen, L. (2012). The ability of object-oriented metrics to predict change-proneness: a meta-analysis. Empirical Software Engineering, 17(3)

[2] S. R. Chidamber and C. F. Kemerer, "A metrics suite for object oriented design," in IEEE Transactions on Software Engineering, vol. 20, no. 6, pp. 476-493, June 1994.

[3] T. J. McCabe, "A Complexity Measure," in IEEE Transactions on Software Engineering, vol. SE-2, no. 4, pp. 308-320, Dec. 1976.

Files

groot.csv

Files (865.3 kB)

Name Size Download all
md5:c06588b9993dd29e4efe284f360321d6
231.2 kB Preview Download
md5:c779a2171f2502eeb4fc2a2e0c05bd17
634.1 kB Preview Download