Published January 9, 2024 | Version 1.0
Dataset Open

C# Dataset of Data Class, Feature Envy and Refused Bequest code smells

  • 1. University of Novi Sad, Faculty of Technical Sciences

Description

This dataset includes open-source projects written in C# programming language, annotated for the presence of Data Class, Feature Envy and Refused Bequest code smells. Each code snippet was manually annotated by at least two annotators.

The dataset contains three excel datasheets:

  • DataSet_Data_Class.xlsx - C# classes annotated for the Data Class code smell
  • DataSet_Feature_Envy.xlsx - C# methods annotated for the Feature Envy code smell
  • DataSet_Refused_Bequest.xlsx - C# classes annotated for the Refused Bequest code smell

The columns in the datasheet represent:

  • Code Snippet ID - the full name of the code snippet.
    • for classes, this is the package/namespace name followed by the class name. The full name of inner classes also contains the names of any outer classes (e.g., namespace.subnamespace.outerclass.innerclass).
    • for methods, this is the full name of the class and the method's signature (e.g., namespace.class.method(param1Type, param2Type))
  • Link - the Github link to the code snippet, including the commit and the start and end LOC.
  • Code Smell - code smell for which the code snippet is examined (Data Class, Feature Envy or Refused Bequest)
  • Project Link - the link to the version of the code repository that was annotated
  • Metrics – a list of metrics for the code snippet, calculated by our platform. Our dataset provides 31 class-level metrics for Data Class and Refused Bequest detection and 19 method-level metrics for Feature Envy detection. The list of metrics and their definitions is available here.
  • Final annotation – a single severity score calculated by a majority vote. 
  • Annotators – each annotator's (1, 2, or 3) assigned severity score.

To help guide their reasoning for evaluating the presence and the severity of a code smell, three annotators independently annotated whether the considered heuristics apply to an evaluated code snippet. We provide these results in three separate excel datasheets:

  • DataClass_Heuristics.xlsx - C# classes annotated for the presence of heuristics relevant for the Data Class code smell.
  • FeatureEnvy_Heuristics.xlsx - C# methods annotated for the presence of heuristics relevant for the Feature Envy code smell.
  • RefusedBequest_Heuristics.xlsx - C# classes annotated for the presence of heuristics relevant for the Refused Bequest code smell.

The columns of these two datasheets are:

    • Code Snippet ID - the full name of the code snippet (matching the IDs from DataSet_Data_Class.xlsx, DataSet_Feature_Envy.xlsx and DataSet_Refused_Bequest.xlsx)
    • Annotators – heuristics labelled by each of the annotators (1, 2, or 3).
    • Heuristics – whether the heuristic is applicable to the examined code snippet or not

Annotators annotated the dataset based on the annotation procedure and guidelines available here.

Notes

This research was supported by the Science Fund of the Republic of Serbia, Grant No 6521051, AI-Clean CaDET. 

Files

Files (209.5 kB)

Name Size Download all
md5:c719dbcbecedb10908fdfa508e718c8c
17.0 kB Download
md5:7e6446de483b4166202ccae1f2d76311
55.7 kB Download
md5:94e29c01288472a5232b34a02be46262
44.5 kB Download
md5:11330c613e921003450f3c62d384d57f
50.5 kB Download
md5:0f8cf3d1b3df28f8ec2c1a4e4622c48a
23.5 kB Download
md5:ef740b048a1a2a76df3ef2c5cef3185f
18.3 kB Download