Published August 5, 2025 | Version v3
Dataset Open

R2Vul - Data and Models

Authors/Creators

Description

This Zenodo repository contains the datasets and models for our paper "R2Vul: Learning to Reason about Software Vulnerabilities with Reinforcement Learning and Structured Reasoning Distillation". 

Datasets

raw_dataset.json - the raw dataset mined from NVD.

r2vul_dataset.zip - the dataset used for fine-tuning and testing.

external_java_test.zip - the external, manually annotated Java dataset (RQ2).

Models and Checkpoints

cls.zip - includes all models checkpoints fine-tuned using CLS.

sft.zip - includes all models checkpoints fine-tuned using SFT.

orpo.zip - includes all models checkpoints fine-tuned using ORPO (R2Vul).

MSIVD.zip - downloaded from https://zenodo.org/records/11403208 (codellama-13b - bigvul_expl).

VulLLM.zip - downloaded from https://zenodo.org/records/10677069 (codellama-13b-multi-r16-2048).

Files

cls.zip

Files (7.5 GB)

Name Size Download all
md5:fa581df60cb06684c566ec2c1d57c10f
3.6 GB Preview Download
md5:7c228b034b6dcec77799fbb3cd09b970
112.4 kB Preview Download
md5:73619f8f8a0512e98589cf32310b1d1c
596.7 MB Preview Download
md5:7a9e23b4d12f307402418f1ef60b79cc
291.8 MB Preview Download
md5:c566ca688c7b03966cd66b5ae37cae28
35.7 MB Preview Download
md5:e56237083f3226f30498d1629329260b
1.3 GB Preview Download
md5:3bdf5df0146bb30404e51ae691c7e305
1.6 GB Preview Download
md5:0d2710d9c274985290c42391544c0ca1
97.2 MB Preview Download