Replication Kit: "Are Unit and Integration Test Definitions Still Valid for Modern Java Projects? An Empirical Study on Open-Source Projects"

doi:10.5281/zenodo.2267946

Published December 13, 2018 | Version 2.0.2

Dataset Open

Replication Kit: "Are Unit and Integration Test Definitions Still Valid for Modern Java Projects? An Empirical Study on Open-Source Projects"

1. University of Goettingen

Replication Kit for the Paper "Are Unit and Integration Test Definitions Still Valid for Modern Java Projects? An Empirical Study on Open-Source Projects"
This additional material shall provide other researchers with the ability to replicate our results. Furthermore, we want to facilitate further insights that might be generated based on our data sets.

Structure
The structure of the replication kit is as follows:

additional_visualizations: contains additional visualizations (Venn-Diagrams) for each projects for each of the data sets that we used
data_analysis: contains python scripts that we used to analyze our raw data
data_collection_tools: contains all source code used for the data collection, including the used versions of the COMFORT framework, the BugFixClassifier, and the used tools of the SmartSHARK environment;
mongodb_no_authors: Archived dump of our MongoDB that we created by executing our data collection tools. The "comfort" database can be restored via the mongorestore command.

Additional Visualizations
We provide two additional visualizations for each project:
1) <project_name>\_disj\_ieee\_venn (visualizations for the DISJ data set)
2) <project_name>\_all\_ieee\_venn (visualizations for the ALL data set)

For each of these data sets there exist one visualization for each project that shows four Venn-Diagrams for each of the different defect types. These Venn-Diagrams show the number of defects that were detected by either unit, or integration tests (or both).

Furthermore, we added boxplots for each of the data sets (i.e., ALL and DISJ) showing the scores of unit and integration tests for each defect type.

Analysis scripts
Requirements:
- python3.5
- tabulate
- scipy
- seaborn
- mongoengine
- pycoshark
- pandas
- matplotlib

Both python files contain all code for the statistical analysis we performed.

Data Collection Tools
We provide all data collection tools that we have implemented and used throughout our paper:

BugFixClassifier: Used to classify our defects.
comfort-core: Core of the comfort framework. Used to classify our tests into unit and integration tests and calculate different metrics for these tests.
comfort-jacoco-listner: Used to intercept the coverage collection process as we were executing the tests of our case study projects.
jSHARK: Library that contains models for the used ORM mapper that is used inside the SmartSHARK environment (for Java).
pycoSHARK: Library that contains models for the used ORM mapper that is used inside the SmartSHARK environment (for Python).
tools-changedistiller: Version of ChangeDistiller that we used within our comfort-core framework.
vcsSHARK: Used to collect data from the VCSs of the projects.

Files

README.md

Files (16.0 GB)

Name	Size	Download all
additional_visualizations.tar.gz md5:f6bb3d5fbfa5528a963aa17d7c15cdef	191.7 kB	Download
data_analysis.tar.gz md5:d533ed09eadcdde41d082804fe3bf9d3	5.2 kB	Download
data_collection_tools.tar.gz md5:7b517f38fbeaf47c9bbde33e76024343	8.8 MB	Download
mongodb_no_authors.agz md5:8ab894a849d10cb626d2a49ca9df6e88	16.0 GB	Download
README.md md5:ea10d6d12cd3e9aa355bb43ed41664ba	2.9 kB	Preview Download

	All versions	This version
Views	1,222	330
Downloads	297	51
Data volume	844.4 GB	272.6 GB

Replication Kit: "Are Unit and Integration Test Definitions Still Valid for Modern Java Projects? An Empirical Study on Open-Source Projects"

Creators

Description

Files

README.md

Files (16.0 GB)