supplemental materials
Description
The sheet file contains feature summarized among literature.
This zip file contains essential materials to demonstrate the experimental results presented in the paper: "Killing Two Birds with One Stone: Malicious Package Detection in NPM and PyPI using a Single Model of Malicious Behavior Sequence"
The package's structure is shown below:
poisoning-dataset
This folder contains the dataset collected for model training. The samples were sourced from Backstabbers-Knife-Collection and Maloss-samples. Access to Backstabber's Knife Collection requires a request to the authors, while access to Maloss-samples can be obtained by filling out the Google Form provided by them.
10-fold-predict-result
This folder contains the prediction results of each fold for Effectiveness Evaluation (RQ1) in the paper. Each row includes two columns: package version name and label (0 for benign, 1 for malicious).
ablation-predict-result
This folder contains the prediction results of each fold for the Ablation Study (RQ3) in the paper. Each row includes two columns: package version name and label (0 for benign, 1 for malicious).
real-world
This folder contains the code and data for the Real-World Usefulness Evaluation (RQ4) in the paper.
-
monitor-resultincludes tables for each month's monitoring results. The CSV columns are as follows:Column Description package package version name positive 1 for flagged by cerebro, 0 for not flagged by cerebro TP 1 for considered as malicious package by manual inspection, 0 for not considered as malicious package -
real-world-monitorcontains Python code to download and collect newly uploaded package versions in npm and PyPI.The requirements for running the code are:
feedparser==6.0.10 lxml==4.9.1 pandas==1.4.4 Requests==2.31.0
-
To download newly uploaded package versions during the last 5 minutes, run the following commands:
# npm $ cd PATH/TO/real-world/pipeline_npm $ python pipeline.py # pypi $ cd PATH/TO/real-world/pipeline_pypi $ python pipeline.py
If you want to change the 5-minute collection period, modify the
minutesglobal variable in the Python file. For 24/7 collection, use scheduling tools likecrontabon Linux. -
collect.ipynbis a Jupyter Notebook that concatenates CSV files generated by cerebro for each download time period, as well as move/unzip packages flagged by cerebro.
-