Dataset Open Access
Sotto-Mayor, Bruno; Elmishali, Amir; Kalech, Meir; Abreu, Rui
The archived file datasets.zip includes the datasets used for supporting the conclusions in the article Exploring Design Smells for Smell-Based Defect Prediction.
In this paper, we answer two research questions:
RQ1. Do Design code smells contribute to the performance of defect prediction models trained with Traditional code smells?
RQ2. How do the different categories of Design smells impact the performance of the defect prediction models?
Therefore, after extracting the archived file documents, you will find two sub-directories, respectively named "RQ1" and "RQ2". They include the results obtained for each one of the research questions, thus supporting our conclusions.
(You will also find a README.pdf file with these same instructions regarding the datasets.)
Inside "RQ1," you will find two directories, respectively named "configuration_1" and "configuration_2". They represent the different configurations for the experiments. "configuration_1" contains the datasets with results for the ten classifiers configurations with the highest scores and "configuration_2" contains the datasets with the results classifier configuration with the overall best results - Support Vector Machine with C=0.1. Furthermore, within each directory, there are three sub-directories, respectively named "designite," "designite_traditional," and "traditional." These have the datasets for each of the considered smell sets in our study. Inside "RQ2," you will find four directories. Each corresponds to a category from the design smells for the dataset "designite_traditional." These datasets were build from the same configuration as "configuration_2".
Then, within every directory, there are 97 sub-directories representing the 97 projects analyzed in this study.
Every project folder follows the same structure, which we define as follows.
|All versions||This version|
|Data volume||1.8 GB||1.8 GB|