Published 2024 | Version 1.3
Software Open

WasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysis

Authors/Creators

Description

This repository houses the official replication package for the paper titled "WasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysis". The package contains the following components:

  • Dataset: The dataset used in this study includes WebAssembly (wasm) binaries compiled by SnowWhite, which were used to generate two path-based code representations. The dataset was processed using our pipeline, resulting in a new dataset that we employed for training our models. This replication package comprises both our new dataset and SnowWhite's dataset.
  • Pipeline: Our pipeline has been designed to extract path sequences from Wasm binaries. We implemented our pipeline using Rust and Python.
  • Data cleaning: These scripts enable the splitting of the dataset into different variants and the creation of different input sequences.
  • Training notebooks: We have included two Jupyter notebooks, one for training a feed-forward neural network for creating code embeddings for method names, and the other for training seq2seq models with five different variants of input sequences.
  • Models: This section includes the weights of the seq2seq models trained using OpenNMT and the feedforward neural network used to generate the code embeddings.
  • Results: The log files in this section contain the evaluation results of our models, including prediction accuracy scores, BLEU scores, and other evaluation metrics.

    For more info, see README.md

Files

wasm_walker_1_3.zip

Files (4.8 GB)

Name Size Download all
md5:95531c8fcfda4fec2d71285d88f1818e
4.8 GB Preview Download