VAPD
Creators
Description
VAPD
Please see the Artifact_Documentation__USENIX_Security_25.pdf for the artifact evaluation.
Source Code
This is the source code repository for VAPD. The training and evaluation code for VAPD and the baseline models is located in the src
directory. Pretrained model weight files are in the models
directory. Training and testing data are in the data
directory. The hidost
directory contains our improved feature extraction method.
Datasets
We have three data sources:
-
Contagio: Contagio Dump
-
CIC-PDFMal2022: CIC PDFMal 2022
-
PDF corpus: PDF Corpus
The data
directory contains the processed data and the original PDF files.
Environment Setup
Requires python3. All dependencies are listed in requirements.txt
.
EvadeML requires the Cuckoo sandbox and must be run in a Python 2 environment.
-
For configuring the Cuckoo sandbox, refer to: Cuckoo Sandbox
-
For configuring EvadeML, refer to: EvadeML
For feature extraction, Hidost needs to be set up properly. The poppler
dependency must be correctly configured. See: https://github.com/srndic/hidost
Adversarial Attack Testing
Adversarial attack tests are located in the adversarial attack
directory, which includes two attack models:
-
MalGAN: An adversarial attack framework targeting feature space. The original version is from Malware-GAN-attack. We reproduced and adapted it to Python 3 and evaluated it on various baseline datasets.
-
EvadeML: An adversarial attack framework targeting problem space. The original version is from EvadeML. We implemented adaptive evolutionary attacks based on this, enhancing the strength of adversarial attacks compared to the original version.
-
Reverse Mimicry: The implemented script is located under
adversarial attack/reverse_mimicry
Evaluation
Evaluation code for VAPD and the baseline models is implemented in test.py
.
usage: test.py [-h] --dataset DATASET --label LABEL --model MODEL --model_path MODEL_PATH
-
--dataset: Path to the evaluation dataset, such as D1, D2, or D3. Using absolute paths is recommended.
-
--label: Label file path for the evaluation dataset (D1, D2, or D3). Using absolute paths is recommended.
-
--model: Model to be tested. Options: VAPD|KNN|AE|DeepSVDD|RTM. Other models mentioned in the paper are still being updated and organized.
-
--model_path: Path to the pretrained model weights. Using absolute paths is recommended.
# Example usage
# python test.py --dataset /home/user/VAPD/zenodo/data/D1_test.npy --label /home/user/VAPD/zenodo/data/D1_label.npy --model VAPD --model_path /home/user/VAPD/zenodo/models/vapd_weights.pkl
The output includes accuracy (Acc), recall (Rec), precision (Prec), and F1-score for the dataset.
Files
vapd_src.zip
Files
(1.5 GB)
Name | Size | Download all |
---|---|---|
md5:f3a809d31d62781c0137680388d1c4e0
|
1.5 GB | Preview Download |