Published October 3, 2024
| Version v4
Dataset
Restricted
Dataset of the paper "Towards Better Comprehension of Breaking Changes in the NPM Ecosystem"
Creators
Description
This is the dataset of the paper "Towards Better Comprehension of Breaking Changes in the NPM Ecosystem".
We describe the files in this replication package as follows. For each RQ, we present the dataset of the results of the RQ. For each result table, we provide both Excel and CSV file formats.
Scripts folder:
This folder contains the scripts for cloning repositories, obtaining breaking change commits and running test cases (in RQ1).
Collected_data folder:
This folder contains our collected breaking changes from 381 sampled NPM projects. The files include:
1) sampled_projects.txt, the 381 projects we sampled.
2) original_breaking_commits.{csv, xlsx}, the breaking commits obtained from the 381 projects (5,242 in total)
3) breaking_commits_after_removal.{csv, xlsx}, it contains the breaking commits after removing (1) non-JavaScript source code change, (2) contains very long commit messages over 10 lines (The detailed process is in Section 3.1 of the paper). There are 2,724 breaking changes in total.
RQ1 folder:
This folder contains the breaking changes used in RQ1 (Section 4.1).
1) documented_bc_can_be_detected.{csv, xlsx}, it contains the breaking changes after removal, and the “can_be_detected” column indicates whether a documented breaking change can be detected by test cases.
2) detected_bc_are_documented.{csv, xlsx}, it contains the breaking changes sampled from all commits from 381 projects. The “can_be_detected” column indicates whether a commit can be detected by test cases, and the “documented” column indicates whether a commit is a documented breaking change.
The script of running test cases “run_tests.py” is in “Scripts” folder.
RQ2-3-4 folder:
This folder contains the breaking changes used in the analysis process of RQ2, RQ3 and RQ4. Compared to the breaking commits in the Collected_data folder, we remove the breaking changes that cannot be linked to reason information. This is detailed in Section 3.3 of the paper:
1) used_projects.{csv, xlsx}, the projects that contain breaking changes (131 in total)
2) analyzed_breaking_changes.{csv, xlsx}, the analyzed breaking changes in RQ2 to 4. All breaking changes are annotated: column “category” indicates the type of the breaking change (the classification process is detailed in Section 3.2), column “change_signature_type” is for RQ2 (Section 4.2), column “change_behavior_type” is for RQ3 (Section 4.3) and column “reason” is for reasons behind breaking changes in RQ4 (Section 4.4).