There is a newer version of the record available.

Published 2023 | Version v3
Other Restricted

supplemental materials

Authors/Creators

  • 1. anonymous

Description

The sheet file contains feature summarized among literature.

This zip file contains essential materials to demonstrate the experimental results presented in the paper: "Killing Two Birds with One Stone: Malicious Package Detection in NPM and PyPI using a Single Model of Malicious Behavior Sequence"

The package's structure is shown below:

poisoning-dataset

This folder contains the dataset collected for model training. The samples were sourced from Backstabbers-Knife-Collection and Maloss-samples. Access to Backstabber's Knife Collection requires a request to the authors, while access to Maloss-samples can be obtained by filling out the Google Form provided by them.

10-fold-predict-result

This folder contains the prediction results of each fold for Effectiveness Evaluation (RQ1) in the paper. Each row includes two columns: package version name and label (0 for benign, 1 for malicious).

ablation-predict-result

This folder contains the prediction results of each fold for the Ablation Study (RQ3) in the paper. Each row includes two columns: package version name and label (0 for benign, 1 for malicious).

real-world

This folder contains the code and data for the Real-World Usefulness Evaluation (RQ4) in the paper.

  • monitor-result includes tables for each month's monitoring results. The CSV columns are as follows:

    Column Description
    package package version name
    positive 1 for flagged by cerebro, 0 for not flagged by cerebro
    TP 1 for considered as malicious package by manual inspection, 0 for not considered as malicious package
  • real-world-monitor contains Python code to download and collect newly uploaded package versions in npm and PyPI.

    The requirements for running the code are:

    feedparser==6.0.10
    lxml==4.9.1
    pandas==1.4.4
    Requests==2.31.0
    • To download newly uploaded package versions during the last 5 minutes, run the following commands:

      # npm
      $ cd PATH/TO/real-world/pipeline_npm
      $ python pipeline.py
      # pypi
      $ cd PATH/TO/real-world/pipeline_pypi
      $ python pipeline.py

      If you want to change the 5-minute collection period, modify the minutes global variable in the Python file. For 24/7 collection, use scheduling tools like crontab on Linux.

    • collect.ipynb is a Jupyter Notebook that concatenates CSV files generated by cerebro for each download time period, as well as move/unzip packages flagged by cerebro.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.