Replication package for the Journal of Systems and Software (JSS) submitted article : "Evaluating Time-Dependent Methods and Seasonal Effects in Code Technical Debt Prediction"
Creators
Description
This replication package contains all the Python code to conduct the data collection, preprocessing and analysis of this study.
Getting started
These instructions will get you a copy of the project up and running on your local machine.
Prerequisites
Running the code requires Python3.9. See installation instructions here.
The dependencies needed to run the code are all listed in the file requirements.txt
. They can be installed using pip:pip install -r requirements.txt
. Such requirements include classic statistics and ML libraries such as scikit-learn, scipy or statsmodels among others.
You might also want to consider using virtual env. Which we did.
Structure of the replication package
The replication package provides two different directories. The first one provides the codes for running the study and the second one
provides the initial data files obtained from the dataset specified in the paper.
Codes
The folder ../codes/
should contain the following files:commons.py
main.py
ml_modelling.py
ml_modelling_backward.py
modules.py
related_work.py
related_work_speed.py
result_combiner.py
ts_modelling.py
ts_modelling_speed.py
visualization.py
preprocessing.py
tsDataPreparation.py
Data
The folder ../data/
should contain the following files:biweekly_data/
monthly_data/
complete_data/
raw-data/
Furthermore, if needed, the database file of the Technical Debt dataset is also provided in the replication package as td_V2.db
file.
Running the code
NOTE 1: Please, find the DATA_PATH
global variable in the commons.py
script and define the path where the program should create all the needed results.
The logic would be that you provide the base path is the location of this replication package in your machine, and you add data
as the location for the data files.
NOTE 2: The different stages of the study execution are splitted in the main.py
script, from the boolean definitions incommons.py
practitioners can decide which stages want to be manipulated or re-executed again without affecting the other stages.
For a complete execution, set all the boolean global variables to True
Stage 1: PREPROCESSING
- Executes scripts
preprocessing.py
andtsDataPreparation.py
. - In these scripts the raw data from the Technical Debt dataset are converted into project-divided csv files repeated in
biweekly
,monthly
andcomplete
format by first performing data cleaning and preprocessing using the techniques
described in the paper.
Stage 2: SARIMAX
- Executes script
ts_modelling_speed.py
for executing the multivariate Time Series Analysis approach proposed:seasonality=True
: Addresses the seasonality effect within the data through SARIMAXseasonality=False
: Does not address the seasonality effect and implements just ARIMAX.
Stage 3: RELATED_WORK
- Executes script
related_work_speed.py
for executing the multivariate Time Series Analysis provided from the related work:seasonality=True
: Addresses the seasonality effect within the data by implementing univariate SARIMA+LM.seasonality=False
: Does not address the seasonality effect and implements just ARIMAX+LM.
Stage 4: ML_MODELS
- Executes script
ml_modelling_backward.py
for executing the considered ML models with backward variable selection procedure.
Stage 5: COMBINE_RESULTS
- Executes script
result_combiner.py
for combining the performance results from all the resulting projects.- Generates the final results from all the models by collecting their average results. The
csv
as well asLaTex
tables can be found in../data/final_results/
.
- Generates the final results from all the models by collecting their average results. The
Stage 6: Visualization of the results
- For the sake of flexibility, multiple visualization options apart from the ones displayed in the paper can be obtained by running all the cells existing in the Jupyter Notebook
visualization.ipynb
.
Stage 7: Long-term forecasting
- Executes script
revision_demo.py
for performing long-term forecasting for both ARIMAX (with biweekly data) ans SARIMAX (with monthly data) given their results in the previous stages of the study. The multiple results should be directly generated
in the locations../data/arimax_demo
and../data/sarimax_demo
accordingly.
Files
JSS_TDPred_replication_package.zip
Files
(359.0 MB)
Name | Size | Download all |
---|---|---|
md5:4524cd27e820321c3f0139d739879b11
|
359.0 MB | Preview Download |
Additional details
Dates
- Available
-
2025-03-05Creation date of this replication package link
Software
- Programming language
- Python, SQL
- Development Status
- Active