Software vulnerability detection datasets (Function Level)
Description
This repository contains pre-processed datasets for software vulnerability detection (based on rules applied and mentioned in the ARES'25 paper), in the CSV format.
Each source code sample includes the following 16 properties (with a few exceptions like CodeParrot and Vul4J, that contains only three fields, also mentioend below):
index: index of code.
vul_code: the vulnerable code (function)
is_vulnerable: the code is vulnerable (TRUE) or a patch (FALSE).
programming_language: programming language of the code.
method_name: name of the method.
file_name: name of the file where the source code is extracted.
repo_url: url of the project repository.
repo_owner: owner of the repository.
committer: developer who pushed the commit.
committer_date: date when the commit was pushed.
commit_msg: the commit message.
cwe_id: If is_vulnerable==True, the CWE id; otherwise None.
cwe_name: If is_vulnerable==True, the name of corresponding CWE; otherwise None.
cwe_description: If is_vulnerable==True, the description of corresponding CWE; otherwise None.
cwe_url: If is_vulnerable==True, the url to obtain more details of corresponding CWE; otherwise None.
cve_id: If is_vulnerable==True, the CVE id; otherwise None.
patch: the patched code (function)
Files
data_CodeParrotGit_detection_Java_complete.csv
Files
(1.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:e06f0ad0e9405a0ffb64b11e5f8c035a
|
475.0 MB | Preview Download |
|
md5:2362b8abd784d470be0437c7bf55f37d
|
3.2 MB | Preview Download |
|
md5:da4f26e452ad7b34169353106fdc3046
|
279.9 MB | Preview Download |
|
md5:da4f26e452ad7b34169353106fdc3046
|
279.9 MB | Preview Download |
|
md5:b2e90a75e27dd2f561959eb40bfef438
|
12.1 MB | Preview Download |
|
md5:213d1e09901a6476c9c403bb91228493
|
1.9 MB | Preview Download |