Published January 24, 2023 | Version v2
Dataset Open

DACOS - Dataset

  • 1. Dalhousie University

Description

DACOS - DAtaset of COde Smells

 

The dataset offers annotated code snippets for three code smells— multifaceted abstraction, complex method, and long parameter list.

In addition to a manually annotated dataset on potentially subjective snippets, we offer a larger set of snippets containing the snippets that are either definitely benign or smelly.

The upload contains three files :

  1. DACOSMain.sql - This is the SQL file containing the main DACOS dataset. 
  2. DACOSExtended.sql - This is the SQL file containing the Extended DACOS dataset. 
  3. Files.zip - The zip file containing all the source code files. 

Required Software

The dataset is created in MySQL. Hence a local or remote installation of MySQL is needed with privileges to create and modify schemas.

Importing the Dataset

The dataset is a self-contained SQL file. To import the dataset, run the following command:

 

mysql -u username -p database_name < DACOSMain.sql
mysql -u username -p database_name < DACOSExtended.sql

 

Understanding the Datasets

Both the datasets differ in architecture. The main dataset contains a table named annotations that contains every annotation collected from users. The sample table contains the samples presented to the user for annotation. The class_metrics and method_metrics contain the tables for class and method metrics respectively. These were used to filter samples that are likely to contain smells and hence can be shown to users. 

The extended dataset is created by selecting samples that are below or above the selected metric range for each smell. Hence, these samples are definitely smelly or benign. The extended version of the dataset does not contain a table for annotation since they were not presented to user. It instead has an 'entry' table where each sample is classified according to the smell it contains. The codes for identifying smells are as below:

Condition smell Id
Multifaceted Abstraction Present 1
Multifaceted Abstraction not detected 4
Long Parameter List Present 2
Long Parameter List Absent 5
Complex Method Present 3
Complex Method Absent 6

 

Files

contributors.txt

Files (139.8 MB)

Name Size Download all
md5:453d08349206d1d99822f2e23e72f9f6
451 Bytes Preview Download
md5:081b9725362b34a78e18de303aa35fb2
38.9 MB Download
md5:30fe42bd366da825908e86d901702b79
35.4 MB Download
md5:d38d4a2526123c66f6b30e871e50cd4c
65.4 MB Preview Download