Published May 17, 2025 | Version v9
Conference paper Open

Today's cat is tomorrow's dog: accounting for time-based changes in the labels of ML vulnerability detection approaches (Replication Package Part 1: NVD Vuldeepecker Dataset)

  • 1. ROR icon University of Trento
  • 2. ROR icon Vrije Universiteit Amsterdam

Description

The Replication Package of

"Today's cat is tomorrow's dog: accounting for time-based changes in the labels of ML vulnerability detection approaches"

Part 1 (NVD Vuldeepecker Dataset)

This repository includes zip files:
  1. Code.zip that contains the codes to replicate some parts of this study:
    a. 1_generate_datasets implements our methodology to generate the datasets.
    b. 2_run_models runs the ML models during the evaluation.
    c. 3_result_replication generates charts presented in the paper from the ML evaluation results.
  2. Datasets.zip that contain 2 folders:
    a. original datasets: 1 from NVD Vuldeepecker and 3 extracted from BigVul.
    b. NVD Vuldeepecker datasets: train, validation, test sets for each time of observation extracted using our methodology from NVD Vuldeepecker dataset.
  3. Pretrained-models.zip that we generated during our evaluation (3 test results for each time point in the timeline [2008-2019]).
  4. Results.zip of our evaluation, the folder ALL contains the overall results and other folders are results by model.

UPDATED version 8
- added a GLOBAL_README.md which contains the 3 stages and how they are connected to each other
- updated LineVul.ipynb: import AdamW from torch.optim instead of transformers
- updated README.md in Code2Vec with the prerequisites of Java to run gradlew for astminer

UPDATED version 9
- updated CodeBert.ipynb: import AdamW from torch.optim instead of transformers

Documentations

  1. INSTALL.pdf : how to install the codes
  2. README.pdf: readme file
  3. REQUIREMENTS.pdf: hardware and software requirements
  4. STATUS.pdf : status for artifact submission
  5. LICENSE.pdf: the license of this artifact
  6. PAPER.pdf: the camera-ready version of the paper

Please refer to the following repositories for the other datasets and pre-trained models:
- Part 2 LINUX : https://doi.org/10.5281/zenodo.10960662
- Part 3 OPENSSL : https://doi.org/10.5281/zenodo.10966117
- Part 4 POPPLER : https://doi.org/10.5281/zenodo.14713143

This work was partly funded by the EU under the H2020 Program AssureMOSS (Grant n. 952647) and the Horizon Europe Program Sec4AI4Sec (Grant n. 101120393), by the Italian Ministry of University and Research (MUR) under the P.N.R.R. – NextGenerationEU grant n.\ PE00000014 (SERICS subproject COVERT), and by the Dutch Research Council (NWO) under the grant NWA.1215.18.006 (Theseus) and grant KIC1.VE01.20.004 (HEWSTI). 

Files

Code.zip

Files (31.7 GB)

Name Size Download all
md5:3d5903be490133fb9d053253bd49b1af
73.3 MB Preview Download
md5:7ffb179c4eab9f4e4b4962d54ff2dd75
27.2 MB Preview Download
md5:057d12d8811871afb466f38a12a79547
196.5 kB Preview Download
md5:e0e70cefb02ddbef90f1cd1e36d32ea4
79.2 kB Preview Download
md5:a8b4661df6898e8a14602202a2faf6d0
4.5 MB Preview Download
md5:9d6aa73d70b390ef62551dc7ae793e92
31.6 GB Preview Download
md5:9df90f1bbb221c710fb773eac0cd1e54
1.2 kB Preview Download
md5:a85682a67548b11306555341e6b6a8a6
82.5 kB Preview Download
md5:d9840d04aa595a30d0fff030a280e33e
143.0 kB Preview Download
md5:f2e0c799b9d047fdb40a9023c47b09f4
117.0 kB Preview Download
md5:ef03b78b116a4fe8c5af61e9a0eaae77
74.5 kB Preview Download

Additional details

Funding

European Commission
Sec4AI4Sec - Cybersecurity for AI-Augmented Systems 101120393
European Commission
AssureMOSS - Assurance and certification in secure Multi-party Open Software and Services. 952647
Dutch Research Council
Theseus NWA.1215.18.006
Dutch Research Council
HEWSTI KIC1.VE01.20.004

Dates

Updated
2025-05-17