Published September 8, 2021 | Version v2
Dataset Open

What really changes when developers intend to improve their source code: A commit-level study of static metric value and static analysis warning changes

  • 1. TU Clausthal
  • 2. University of Goettingen

Description

This is the dataset for the publication "What really changes when developers intend to improve their source code: A commit-level study of static metric value and static analysis warning changes".

It contains a random sample of 2533 commits from 54 Java Apache open source projects classified by two researchers into perfective, corrective and other changes (manual_labels.csv).  Moreover, we include static source code metrics and static analysis warnings for the 2533 changes in al_changes_gt.csv.gz.

In addition, we include the full dataset of 125482 commits in all_changes_sebert.csv.gz with all metrics and automatic labels for every commit that was not manually labeled. The automatic labels were provided by a fine-tuned transformer model (BERT) pre-trained exclusively on software engineering data.

We also provide the fine tuned version of the pre-trained model in seBERT_fine_tuned_commit_intent.tar.gz as well as a Snapshot of the SmartSHARK MongoDB database used in gathering the raw data in smartshark_emse.agz.

The model can be tested live on the website accompanying the publication.

Files

manual_labels.csv

Files (37.6 GB)

Name Size Download all
md5:6b5473cf189e7d8eff7e1fc1f1ec3db7
2.4 MB Download
md5:c36f08553abc34f0f1ab932aae4444f5
115.4 MB Download
md5:a099d942098227a1fc8127759e55850e
156.2 kB Preview Download
md5:defacc192135c135caa4f170fe93b789
1.2 GB Download
md5:272671d77f7dfd48b4759a5f8757173c
36.2 GB Download