Published September 13, 2024 | Version 2.0 - Vul4J+
Dataset Open

Vul4J+: A Dataset of Vulnerabilities for Automated Vulnerability Repair

Description

Vul4J+ is a dataset of vulnerability fixes for automated vulnerability repair (AVR) in Java. Each entry of the dataset represents a vulnerability affecting an open-source Java project, having reference to the commit (revision) containing the code affected by the vulnerability and its version fixed by a human developer (the "left" and "right" parts of the commit). Each vulnerability is equipped with at least one "oracle" that shows the presence of the vulnerability, and that can be used to validate the correctness of patches generated by AVR tools. This *"oracle"* might have the form of a:
Vulnerability-witnessing test, i.e., a JUnit test case that fails on the vulnerable version of the code but passes on the patched version.
- Warning/report raised by a vulnerability static analyzer, i.e., SpotBugs, that is presented in the vulnerable version of the code but not in the patched version.

In essence, Vul4J+ is a cleaned up and extended version of Vul4J containing:
- 106 known vulnerabilities with executable vulnerability-witnessing test cases in Docker containers and warnings (reports) from SpotBugs static analyzer (if found);
- 79 come from the original Vul4J;
- 27 result from the replication of the same protocol used in the original Vul4J;
- 50 vulnerabilities stored in Docker containers with the warnings (reports) from SpotBugs static analyzer ;
- 35 known vulnerabilities matched with vulnerability-witnessing test cases retrieved from projects in the wild.

In total, Vul4J+ points to 191 vulnerabilities, each with at least one vulnerability oracle.

Files

README.md

Files (26.3 MB)

Name Size Download all
md5:7eb20e39fdc43d58a7a2e100b10023ce
23.9 kB Preview Download
md5:9978a129b77f01c16f294b32bf4b01bd
26.3 MB Preview Download

Additional details

Funding

European Commission
Sec4AI4Sec - Cybersecurity for AI-Augmented Systems 101120393

Software

Repository URL
https://github.com/tuhh-softsec/vul4j
Programming language
Python, Java
Development Status
Active