Differentiating Refactoring Practices: A Comparative Analysis of ML and Non-ML Frameworks
Authors/Creators
Description
Research Overview:
This research is a longitudinal empirical study of refactoring practices between machine learning (ML) frameworks and non-ML frameworks in the field of software engineering. We mined over 1-million commits and over 4-million refactorings from 130 ML frameworks and 800 non-ML frameworks (i.e. 65 ML and 400 non-ML Java frameworks; 65-ML and 400 non-ML Python frameworks). We used the tool RefactoringMiner to extract Java and Python refactorings respectively, while we implement a custom Python script using GitHub API to extract the commits from their repositories. We further divided the commit and refactoring extraction into stage grouping (early, middle and late) using the emperical proposition of prior study.
Notably, this research is motivated to understand how refactoring differs per development stage in ML vs non-ML framework, and how these findings can benefit software engineering practitioners in the area of software quality assurance. This research is primarily quantitative as we only analyzed the refactorings that were extracted by RefactoringMiner.
Follow the guide below to reproduce our study findings:
(A) Requirements
- Build RefactoringMiner version 3.1.2 from Github to extract Python and Java refactorings
- Build Lizard framework version 1.21.3 to evaluate code quality before and after refactoring
- Install all libraries in the requirements.txt
(B) Dataset Availability
- The zip file labeled: "Datasets_and_analysis" contains the extracted refactoring and commit JSON files for both Java and Python frameworks.
- All commits JSON can be located in the folder path ./project_commits", while refactoring .JSON can be located in folder "./raw_json_data".
(C) Implementation Steps
- First, generate a GitHub access token to have access to call the GitHub API when using RefactoringMiner, and our custom Python scripts. Then extract the data inside "Datasets_and_analysis" zip file
Step 1: To run RefactoringMiner, you can build as a Gradle or Maven project. Although, we provide our custom Python script (refactoringminer.py) that integrates the RefactorinMiner's data extraction function in order to extract refactorings of multiple frameworks in a loop, you can as well visit their official Github page for more implementation details.
- run refactoringminer.py to extract refactorings of studied subjects
- run extract_commit.py to extract commits from Github of studied subjects
Step 2: run python file "./pull_data_raw_json.py" to extract the required nodes from the commits and refactoring JSON into CSV files inside the folder named "extracted_refactorings".
Step 3: run "group_into_stages.py" to group the refactorings and commits into their respective development stages (early, middle and late).
Step 4: Analysis: We provide the implementation script for each RQ as presented below:
- Python scripts with prefix "rq1_", "rq2_", and "rq3_" represents implemetation scripts to analyze and generate figures for for RQ1, RQ2, and RQ3 respectively .
- Other Python script without the prefix is labeled to identify their functionality
- lizard_refactoring_impact.py contains all the Python implementation script for Lizard framework which we used to estimate code quality of the subject systems before and after refactotoring
Step 5: Manual investigation records for RQ1, RQ2, and RQ3 can be found in the folder labeled "manual_verification" folder.
Note:
- All Figures generated by the implementation scripts can be found inside the generated "./outputs" folder
- Table data are generated based on analysis for each RQ
Files
Datasets_and_analysis.zip
Files
(2.4 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:55ccd5cbf770871129f44561175cddbd
|
2.4 GB | Preview Download |
|
md5:3a66d0fba58117b7ed9893ad99b21b21
|
7.9 kB | Download |
|
md5:731632e7a345c59d5cd624d547c7a39c
|
9.9 kB | Download |
|
md5:12e734de8d45a2b1ebd3d09cc40fdff0
|
10.2 kB | Download |
|
md5:bb1796e42ed69127fd5ab3206530bf68
|
11.0 kB | Download |
|
md5:0d66a602812e2c170c0f4015faf4f053
|
12.4 kB | Download |
|
md5:e2017acfd2f0bc02c9ddd7ef85f1024f
|
16.2 kB | Download |
|
md5:36655ffe277a389b0559c504f8ae947d
|
2.4 MB | Preview Download |
|
md5:c7e40dae61bebc87602d7d0e870084ee
|
23.3 kB | Download |
|
md5:34fcada8df0fd702075bcf0de2629347
|
403 Bytes | Download |
|
md5:2484681728f463745093e853c36dca3f
|
6.0 kB | Download |
|
md5:b45f3edf5a5a763c344d0df2067b65e3
|
1.5 kB | Preview Download |
|
md5:15f71e311b2509080b1fc78dd354a414
|
24.7 kB | Download |
|
md5:244be3d5296f654d8f3baab068ab6415
|
3.8 kB | Download |
|
md5:1e37034cb764d0b8c8a048e4d4523343
|
24.7 kB | Download |
|
md5:48f459d6290c38087bad6f3126d5c4b6
|
7.8 kB | Download |
|
md5:a301b075e711872897459caf16345775
|
11.4 kB | Download |
|
md5:dada6b883316605aadbf70de7199bc42
|
38.7 kB | Download |
|
md5:4f3b8d50be3398b501a9d6fde14642f5
|
38.7 kB | Download |
|
md5:2ba8afc02698854d19772d41aa915d4d
|
38.6 kB | Download |
|
md5:ea0872f3d79e262bd288bb18c5377dce
|
38.6 kB | Download |
|
md5:5a9df63fee0b101e4b6d1bea2d07d9d0
|
38.6 kB | Download |
|
md5:b4ce038a9dbf5c3a63791aae4dfa8056
|
38.7 kB | Download |