Enhancing Software Development: Predicting Task Resolution Time through Affective and Personality Factors
Authors/Creators
Description
# README for "Enhancing Software Development: Predicting Task Resolution Time through Affective and Personality Factors"
Welcome to the companion data and resources for the paper titled "Enhancing Software Development: Predicting Task Resolution Time through Affective and Personality Factors." This README provides step-by-step instructions to reproduce the study, validate results, analyze charts, and review metrics.
## Folder Structure
- `dataset/`: Contains all datasets used in the study.
- `machineLearningCompanyCollab/`: Python project for running the machine learning algorithms.
- `notebooks/`: Jupyter notebook files used in the analysis.
## Step-by-Step Instructions
### Step 1 - Data Analysis - Tasks
This step involves pre-processing the original tasks data from the software company to derive additional features.
**Key Operations:**
1. Reading the original tasks file.
2. Converting specific project task statuses into broader categories.
3. Calculating derived variables like time from development to test, fix to test, and development to completion.
### Step 2 - Data Analysis - Surveys
This step handles the pre-processing of survey data, focusing on developers' emotional states and stress levels.
**Key Operations:**
1. Reading the survey data.
2. Converting emotional and stress data into appropriate categories.
3. Calculating mean emotional polarity and stress level.
4. Selecting eligible surveys for further analysis and merging them with task data.
5. Adding personality traits and developer experience information to the dataset.
The `Step 2 - Data Analysis - Surveys (Weighted)` notebook performs similar operations but on a weighted dataset.
### Step 3 - Categorize Target Variable
Categorizes the target variable based on the Interquartile Range (IQR) method.
**Key Operations:**
1. Reading the dataset.
2. Categorizing the target variable according to defined criteria.
### Step 4 - Remove Outliers
Removes outliers from the dataset using the IQR method to ensure clean data for machine learning.
**Key Operations:**
1. Reading the dataset.
2. Applying the IQR method to identify and remove outliers.
### Step 5 - Generate SMOTE Instances
Generates synthetic instances (SMOTE) to balance the datasets for machine learning models.
**Key Operations:**
1. Generating SMOTE instances of various sizes: 250, 500, 1000, 3000, and 5000.
2. Ensuring proper selection of pondered or non-pondered datasets.
### Step 6 - Run Machine Learning Algorithms
Runs machine learning algorithms on prepared datasets to evaluate their performance.
**Key Operations:**
1. Configuring output paths and dataset weighting in the `utils.py` and `main.py` scripts.
2. Running the algorithms with command-line instructions, allowing flexibility in model selection.
Examples:
“python3 main.py -t False --models knn,svc,random-forest,decision-tree,mlp-classifier,logistic-regression -b True”
“python3 main.py -t False --models knn -b True”
### Step 7 - Consolidate Results
Consolidates machine learning model results into a single output file.
**Key Operations:**
1. Reading output files from each model's results.
2. Creating a unified file with all results for further analysis.
### Step 8 - Correlation Analysis
Performs correlation analysis on the datasets.
**Key Operations:**
1. Analyzing correlation between features in the dataset.
2. Identifying significant correlations.
### Step 9 - Charts - Datasets Distribution
Generates boxplots to visualize the distribution of F1 scores across models.
**Key Operations:**
1. Generating boxplots to visualize F1 score distributions.
2. Comparing results for weighted and non-weighted datasets.
### Step 10 - Wilcoxon Signed-Rank Test
Performs statistical analysis to assess significant differences in F1 scores between weighted and non-weighted datasets.
**Key Operations:**
1. Applying the Wilcoxon test to compare F1 score results.
2. Identifying statistically significant differences.
### Step 11 - Identify the Best Models
Identifies the best models for each SMOTE instance and each algorithm.
**Key Operations:**
1. Reading model results.
2. Determining the best-performing models for each instance and algorithm.
### Step 12 - Charts - Confusion Matrix
Calculates the confusion matrix for the best-performing model.
**Key Operations:**
1. Creating confusion matrices to understand model classification accuracy.
2. Using these insights to assess the reliability of the best model.
---
By following these steps, you can reproduce the study, verify the results, and analyze the outcomes presented in the paper. If you have any questions or issues, feel free to contact us.
Files
companionData.zip
Files
(109.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:cdd3decd3f0a7acbbd55f4899c7f6cd8
|
109.3 MB | Preview Download |
Additional details
Software
- Programming language
- Python