When Tools Overlook Domain Knowledge: An Empirical Study of Refactoring in Scientific Software
Authors/Creators
Description
Abstract
Refactoring is a critical process for improving code quality, but anecdotal evidence has shown that refactoring in scientific software (Sci-SW) is not always feasible. The inherently exploratory nature of Sci-SW development, characterized by evolving requirements and limited adoption of traditional software engineering practices, could present significant challenges to refactoring. However, there is no systematic study exploring refactoring practices in Sci-OSS. To bridge this gap, we explore the effectiveness of three state-of-the-art refactoring detection tools: RefDiff (C), RefactoringMiner (Java), and PyRef (Python) to detect refactorings in scientific open-source software (Sci-OSS). Our findings reveal that these tools have significant limitations, detecting fewer refactorings in Sci-OSS than non-scientific OSS (Non-Sci-OSS). Through a mixed-method approach, we identified that 67.54% of undetected refactorings in Sci-OSS require domain knowledge. To complement our analysis of the refactoring code changes, we conducted surveys with 47 practitioners experienced in refactoring Sci-OSS and 14 follow-up interviews to gain deeper insights into the associated challenges. Our results revealed seven novel challenges for Sci-OSS-refactoring, including a domain knowledge gap. These findings emphasize the necessity for specialized tools and strategies to support refactoring in Sci-OSS effectively.
Replication instructions
The project is written in Python 3.12.0
The requirements file contains all the necessary packages to run the project.
To install the required packages, run the following command:
pip install -r requirements.txt
Once the required packages are installed,
To download refactoring data from GitHub,
1. Add your GitHub API token in
RQ1/scripts/mysettings.py
2. Run thedownload_data_from_github.pyfile in theRQ1/scriptsfolder.
This should download all the required files necessary to the data folder from GitHub
Once you have all the data, then install the refactoring detection tools:
The installation instructions are listed below:
- For PyRef: https://github.com/PyRef/PyRef
- For RefactoringMiner3.0: https://github.com/tsantalis/RefactoringMiner
- For RefDiff2.0: https://github.com/aserg-ufmg/RefDiff
Once you have the refactoring detection tools installed.
Please change the execution path in their respective script file:
For example, change the pyref execution path in pyref_script.py file in RQ1/scripts folder.
Once you have changed the execution paths, you should be able to run the refactoring detection tool on all the collected data.
The detection tools will create a JSON file with the list of detected refactorings for every refactoring instance with the name format as 'repo_name_issue_number'.
you can now run all files in the RQ1/scripts folder to replicate the results.
Files
replication_package.zip
Files
(967.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:5bb0eb7fab38695fa8681b19b2f6ee9e
|
967.8 MB | Preview Download |