OSSVul - ReplicationPackage
Description
OSS Vulnerability Dataset and Model Evaluation Framework
This archive contains both datasets and experimental code used in a study on open-source software vulnerability detection. It integrates vulnerability information from the National Vulnerability Database (NVD) with software development artifacts extracted from GitHub and provides a unified framework for constructing datasets and evaluating multiple vulnerability detection models.
The archive provides the data processing pipeline, curated datasets, and experimental scripts used in the study. Vulnerability detection is performed at the sample level with results aggregated at the CVE level to reflect practical vulnerability identification scenarios.
Contents
-
CVE Data Collection
-
Model Data Collection
-
Model Experiments
-
CVE_data.xlsx
CVE Data Collection
This component includes scripts used to construct a unified CVE dataset. CVE records from 1999 to July 2024 were collected from the National Vulnerability Database (NVD) and consolidated into the file CVE_data.xlsx. References to GitHub artifacts, including commits, pull requests, and issues, were extracted from CVE entries and filtered to retain valid artifacts. Artifact creation timestamps and temporal metrics were computed for time-aware analysis.
Model Data Collection
This component provides scripts for constructing model-specific inputs. Datasets were generated at the artifact levels and include both vulnerable and non-vulnerable samples. Due to dataset size, intermediate CSV outputs were merged during preprocessing, and temporal ordering was preserved by splitting the data into RQ2 and RQ3 subsets, presented in experimental datasets.
Model Experiments
This component contains experimental code, configurations, and datasets used to evaluate the following vulnerability detection models:
- MemVul
- VulCurator
- PatchRNN
- LineVul
- DeepTraVul
Experiments are conducted independently for each model using a consistent evaluation protocol. All models operate at the sample level, and a CVE is considered vulnerable if at least one associated sample is predicted as vulnerable.
Notes
All datasets and experimental scripts required to reproduce the reported results are included in this archive.
The CVE dataset is provided in the file CVE_data.xlsx.
This archive is intended to support reproducible research on software vulnerability detection.
Files
OSSVul - ReplicationPackage.zip
Files
(1.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:7c536586bb6cde63a66b0fb12d495820
|
1.9 GB | Preview Download |
Additional details
Dates
- Updated
-
2025-12-18
Software
- Programming language
- Python