VAPD

Anonymous

doi:10.5281/zenodo.15603773

Published June 5, 2025 | Version v5

Model Open

VAPD

Anonymous

VAPD

Please see the Artifact_Documentation__USENIX_Security_25.pdf for the artifact evaluation.

Source Code

This is the source code repository for VAPD. The training and evaluation code for VAPD and the baseline models is located in the src directory. Pretrained model weight files are in the models directory. Training and testing data are in the data directory. The hidost directory contains our improved feature extraction method.

Datasets

We have three data sources:

Contagio: Contagio Dump
CIC-PDFMal2022: CIC PDFMal 2022
PDF corpus: PDF Corpus

The data directory contains the processed data and the original PDF files.

Environment Setup

Requires python3. All dependencies are listed in requirements.txt.

EvadeML requires the Cuckoo sandbox and must be run in a Python 2 environment.

For configuring the Cuckoo sandbox, refer to: Cuckoo Sandbox
For configuring EvadeML, refer to: EvadeML

For feature extraction, Hidost needs to be set up properly. The poppler dependency must be correctly configured. See: https://github.com/srndic/hidost

Adversarial Attack Testing

Adversarial attack tests are located in the adversarial attack directory, which includes two attack models:

MalGAN: An adversarial attack framework targeting feature space. The original version is from Malware-GAN-attack. We reproduced and adapted it to Python 3 and evaluated it on various baseline datasets.
EvadeML: An adversarial attack framework targeting problem space. The original version is from EvadeML. We implemented adaptive evolutionary attacks based on this, enhancing the strength of adversarial attacks compared to the original version.
Reverse Mimicry: The implemented script is located under adversarial attack/reverse_mimicry

Evaluation

Evaluation code for VAPD and the baseline models is implemented in test.py.

usage: test.py [-h] --dataset DATASET --label LABEL --model MODEL --model_path MODEL_PATH

--dataset: Path to the evaluation dataset, such as D1, D2, or D3. Using absolute paths is recommended.
--label: Label file path for the evaluation dataset (D1, D2, or D3). Using absolute paths is recommended.
--model: Model to be tested. Options: VAPD|KNN|AE|DeepSVDD|RTM. Other models mentioned in the paper are still being updated and organized.
--model_path: Path to the pretrained model weights. Using absolute paths is recommended.

# Example usage
# python test.py --dataset /home/user/VAPD/zenodo/data/D1_test.npy --label /home/user/VAPD/zenodo/data/D1_label.npy --model VAPD --model_path /home/user/VAPD/zenodo/models/vapd_weights.pkl

The output includes accuracy (Acc), recall (Rec), precision (Prec), and F1-score for the dataset.

Files

vapd_src.zip

Files (1.5 GB)

Name	Size	Download all
vapd_src.zip md5:f3a809d31d62781c0137680388d1c4e0	1.5 GB	Preview Download

	All versions	This version
Views	227	81
Downloads	27	8
Data volume	51.6 GB	13.4 GB

VAPD

Creators

Description

VAPD

Source Code

Datasets

Environment Setup

Adversarial Attack Testing

Evaluation

Files

vapd_src.zip

Files (1.5 GB)