Published October 6, 2020
| Version v1
Dataset
Open
Dataset of 'Using Sequence-to-Sequence Learning for Repairing C Vulnerabilities'
- 1. KTH Royal Institute of Technology
- 2. Colorado State University
Description
This is the dataset we collected for the 'Using Sequence-to-Sequence Learning for Repairing C Vulnerabilities' paper. See the description in the paper for how the dataset was collected. Please cite 'Using Sequence-to-Sequence Learning for Repairing C Vulnerabilities' if you use the dataset.
src-all.txt and tgt-all.txt contain the tokenized function pairs and are ready to used as training data. Each line in both txt file corresponds to a function before and after a commit that was classified as a bug fix commit.
The two tar files contain the raw data that was used to generate both txt files. Both containing the commits that were collected during the respective year.