There is a newer version of the record available.

Published June 5, 2025 | Version 1.0
Dataset Open

Software vulnerability detection datasets (Function Level)

  • 1. ROR icon Luxembourg Institute of Science and Technology

Description

This repository contains pre-processed datasets for software vulnerability detection (based on rules applied and mentioned in the ARES'25 paper), in the CSV format.

Each source code sample includes the following 16 properties (with a few exceptions like CodeParrot and Vul4J, that contains only three fields, also mentioend below): 

index: index of code. 

vul_code: the vulnerable code (function)

is_vulnerable: the code is vulnerable (TRUE) or a patch (FALSE).

programming_language: programming language of the code.

method_name: name of the method.

file_name: name of the file where the source code is extracted.

repo_url: url of the project repository.

repo_owner: owner of the repository.

committer: developer who pushed the commit.

committer_date: date when the commit was pushed.

commit_msg: the commit message.

cwe_id: If is_vulnerable==True, the CWE id; otherwise None.

cwe_name: If is_vulnerable==True, the name of corresponding CWE; otherwise None.

cwe_description: If is_vulnerable==True, the description of corresponding CWE; otherwise None.

cwe_url: If is_vulnerable==True, the url to obtain more details of corresponding CWE; otherwise None.

cve_id: If is_vulnerable==True, the CVE id; otherwise None.

patch: the patched code (function)

Files

data_CodeParrotGit_detection_Java_complete.csv

Files (1.1 GB)

Name Size Download all
md5:e06f0ad0e9405a0ffb64b11e5f8c035a
475.0 MB Preview Download
md5:2362b8abd784d470be0437c7bf55f37d
3.2 MB Preview Download
md5:da4f26e452ad7b34169353106fdc3046
279.9 MB Preview Download
md5:da4f26e452ad7b34169353106fdc3046
279.9 MB Preview Download
md5:b2e90a75e27dd2f561959eb40bfef438
12.1 MB Preview Download
md5:213d1e09901a6476c9c403bb91228493
1.9 MB Preview Download