What really changes when developers intend to improve their source code: A commit-level study of static metric value and static analysis warning changes
Authors/Creators
- 1. TU Clausthal
- 2. University of Goettingen
Description
This is the dataset for the publication "What really changes when developers intend to improve their source code: A commit-level study of static metric value and static analysis warning changes".
It contains a random sample of 2533 commits from 54 Java Apache open source projects classified by two researchers into perfective, corrective and other changes (manual_labels.csv). Moreover, we include static source code metrics and static analysis warnings for the 2533 changes in al_changes_gt.csv.gz.
In addition, we include the full dataset of 125482 commits in all_changes_sebert.csv.gz with all metrics and automatic labels for every commit that was not manually labeled. The automatic labels were provided by a fine-tuned transformer model (BERT) pre-trained exclusively on software engineering data.
We also provide the fine tuned version of the pre-trained model in seBERT_fine_tuned_commit_intent.tar.gz as well as a Snapshot of the SmartSHARK MongoDB database used in gathering the raw data in smartshark_emse.agz.
The model can be tested live on the website accompanying the publication.
Files
manual_labels.csv
Files
(37.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:6b5473cf189e7d8eff7e1fc1f1ec3db7
|
2.4 MB | Download |
|
md5:c36f08553abc34f0f1ab932aae4444f5
|
115.4 MB | Download |
|
md5:a099d942098227a1fc8127759e55850e
|
156.2 kB | Preview Download |
|
md5:defacc192135c135caa4f170fe93b789
|
1.2 GB | Download |
|
md5:272671d77f7dfd48b4759a5f8757173c
|
36.2 GB | Download |