Published October 6, 2020 | Version v1
Dataset Open

Dataset of 'Using Sequence-to-Sequence Learning for Repairing C Vulnerabilities'

  • 1. KTH Royal Institute of Technology
  • 2. Colorado State University

Description

This is the dataset we collected for the 'Using Sequence-to-Sequence Learning for Repairing C Vulnerabilities' paper. See the description in the paper for how the dataset was collected. Please cite 'Using Sequence-to-Sequence Learning for Repairing C Vulnerabilities' if you use the dataset.

 

src-all.txt and tgt-all.txt contain the tokenized function pairs and are ready to used as training data. Each line in both txt file corresponds to a function before and after a commit that was classified as a bug fix commit.

 

The two tar files contain the raw data that was used to generate both txt files. Both containing the commits that were collected during the respective year.

Files

src-all.txt

Files (40.4 GB)

Name Size Download all
md5:0367f65d0e2b61e117c966dfc888e77e
18.4 GB Download
md5:b5421630e008bdae46d4605869019e27
17.8 GB Download
md5:af223b983ea0b81791b27191578c4032
2.1 GB Preview Download
md5:0ee999b031b2b0b1dcab6f0183708f89
2.1 GB Preview Download