There is a newer version of the record available.

Published December 2, 2025 | Version v3
Dataset Open

When Tools Overlook Domain Knowledge: An Empirical Study of Refactoring in Scientific Software

Authors/Creators

Description

Abstract

Refactoring is a critical process for improving code quality, but anecdotal evidence has shown that refactoring in scientific software (Sci-SW) is not always feasible. The inherently exploratory nature of Sci-SW development, characterized by evolving requirements and limited adoption of traditional software engineering practices, could present significant challenges to refactoring. However, there is no systematic study exploring refactoring practices in Sci-OSS. To bridge this gap, we explore the effectiveness of three state-of-the-art refactoring detection tools: RefDiff (C), RefactoringMiner (Java), and PyRef (Python) to detect refactorings in scientific open-source software (Sci-OSS). Our findings reveal that these tools have significant limitations, detecting fewer refactorings in Sci-OSS than non-scientific OSS (Non-Sci-OSS). Through a mixed-method approach, we identified that 67.54% of undetected refactorings in Sci-OSS require domain knowledge. To complement our analysis of the refactoring code changes, we conducted surveys with 47 practitioners experienced in refactoring Sci-OSS and 14 follow-up interviews to gain deeper insights into the associated challenges. Our results revealed seven novel challenges for Sci-OSS-refactoring, including a domain knowledge gap. These findings emphasize the necessity for specialized tools and strategies to support refactoring in Sci-OSS effectively.


Replication instructions

The project is written in Python 3.12.0

The requirements file contains all the necessary packages to run the project.
To install the required packages, run the following command:

pip install -r requirements.txt


Once the required packages are installed, 

To download refactoring data from GitHub, 

    1. Add your GitHub API token in RQ1/scripts/mysettings.py
    2. Run the download_data_from_github.py file in the RQ1/scripts folder.

This should download all the required files necessary to the data folder from GitHub

Once you have all the data, then install the refactoring detection tools:
The installation instructions are listed below:

    - For PyRef: https://github.com/PyRef/PyRef
    - For RefactoringMiner3.0: https://github.com/tsantalis/RefactoringMiner
    - For RefDiff2.0: https://github.com/aserg-ufmg/RefDiff

Once you have the refactoring detection tools installed. 
Please change the execution path in their respective script file:
For example, change the pyref execution path in pyref_script.py file in RQ1/scripts folder.

Once you have changed the execution paths, you should be able to run the refactoring detection tool on all the collected data.

The detection tools will create a JSON file with the list of detected refactorings for every refactoring instance with the name format as 'repo_name_issue_number'.

you can now run all files in the RQ1/scripts folder to replicate the results.

 

Files

replication_package.zip

Files (967.8 MB)

Name Size Download all
md5:5bb0eb7fab38695fa8681b19b2f6ee9e
967.8 MB Preview Download