DACOS - Dataset
Description
DACOS - DAtaset of COde Smells
The dataset offers annotated code snippets for three code smells— multifaceted abstraction, complex method, and long parameter list.
In addition to a manually annotated dataset on potentially subjective snippets, we offer a larger set of snippets containing the snippets that are either definitely benign or smelly.
The upload contains three files :
- DACOSMain.sql - This is the SQL file containing the main DACOS dataset.
- DACOSExtended.sql - This is the SQL file containing the Extended DACOS dataset.
- Files.zip - The zip file containing all the source code files.
Required Software
The dataset is created in MySQL. Hence a local or remote installation of MySQL is needed with privileges to create and modify schemas.
Importing the Dataset
The dataset is a self-contained SQL file. To import the dataset, run the following command:
mysql -u username -p database_name < DACOSMain.sql
mysql -u username -p database_name < DACOSExtended.sql
Understanding the Datasets
Both the datasets differ in architecture. The main dataset contains a table named annotations that contains every annotation collected from users. The sample table contains the samples presented to the user for annotation. The class_metrics and method_metrics contain the tables for class and method metrics respectively. These were used to filter samples that are likely to contain smells and hence can be shown to users.
The extended dataset is created by selecting samples that are below or above the selected metric range for each smell. Hence, these samples are definitely smelly or benign. The extended version of the dataset does not contain a table for annotation since they were not presented to user. It instead has an 'entry' table where each sample is classified according to the smell it contains. The codes for identifying smells are as below:
Condition | smell Id |
---|---|
Multifaceted Abstraction Present | 1 |
Multifaceted Abstraction not detected | 4 |
Long Parameter List Present | 2 |
Long Parameter List Absent | 5 |
Complex Method Present | 3 |
Complex Method Absent | 6 |