Published April 29, 2021 | Version v2
Dataset Open

MCMD | Multi-programming-language Commit Message Dataset

Creators

  • 1. Fudan University

Description

A large-scale dataset in multi-programming languages and with rich information.

This dataset is proposed in the paper "On the Evaluation of Commit Message Generation Models: An Experimental Study" accepted to ICSME 2021 and "A large-scale empirical study of commit message generation: models, datasets and evaluation" accepted to EMSE 2022.

Welcome to use our dataset, MCMD, and the evaluation scripts to test the performance of the commit message generation!

Citations for these two works can be found here.

Files

Files (50.2 GB)

Name Size Download all
md5:91846d5dcb6da057218c634df8c18a51
10.8 GB Download
md5:03e24950dd7398aead3cae6dd23c4452
39.5 GB Download