Published April 29, 2021
| Version v2
Dataset
Open
MCMD | Multi-programming-language Commit Message Dataset
Description
A large-scale dataset in multi-programming languages and with rich information.
This dataset is proposed in the paper "On the Evaluation of Commit Message Generation Models: An Experimental Study" accepted to ICSME 2021 and "A large-scale empirical study of commit message generation: models, datasets and evaluation" accepted to EMSE 2022.
Welcome to use our dataset, MCMD, and the evaluation scripts to test the performance of the commit message generation!
Citations for these two works can be found here.
Files
Files
(50.2 GB)
Name | Size | Download all |
---|---|---|
md5:91846d5dcb6da057218c634df8c18a51
|
10.8 GB | Download |
md5:03e24950dd7398aead3cae6dd23c4452
|
39.5 GB | Download |