There is a newer version of the record available.

Published August 14, 2025 | Version v2
Dataset Open

Replication Package for a Study of Software Refactorings in Real-World Open-Source Java Projects

  • 1. ROR icon Beijing Institute of Technology
  • 2. ROR icon Tianjin University
  • 3. ROR icon University of Cincinnati

Description

This replication package accompanies a study on real-world refactorings in Java open-source projects. It contains:

  1. A list of predefined keywords, borrowed from previous studies, used for commit message comparison.

  2. A dataset of commits containing refactorings mined from six popular open-source Java applications: SpringFramework, Elasticsearch, Kafka, Hadoop, Tomcat, and JUnit4.

  3. A consolidated taxonomy of refactorings discovered across these projects.

The file keywords.csv lists the predefined keywords used for commit message filtering. The file master_replication_file.csv contains the refactoring-related commits, their associated refactoring types, and context-specific indicators. The file taxonomy.csv classifies the refactorings by edit type and specificity.

The package is intended to support reproducibility and facilitate further research on refactoring detection, categorization, and tool support. Instructions for reproducing the results—such as cloning the repositories and inspecting the commits—are included in the README file. Each commit can be accessed using the command git show <commit-hash> from the appropriate project folder.

Files

keywords.csv

Files (79.5 kB)

Name Size Download all
md5:c06dfc8a5b544c7dbc8fb727eed6fed5
1.7 kB Preview Download
md5:77317dc119dd0978386b1ce7ddd6cc83
71.6 kB Preview Download
md5:42abac3c6db4d9d23d09a761452930dc
1.8 kB Preview Download
md5:31699503f0863947e0c493e7a894c1ba
4.4 kB Preview Download