There is a newer version of the record available.

Published March 23, 2023 | Version v1
Software Open

WasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysis

Authors/Creators

Description

This repository houses the official replication package for the paper titled "WasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysis". The package contains the following components:

  • Dataset: The dataset used in this study includes WebAssembly (wasm) binaries compiled by SnowWhite, which were used to generate two path-based code representations. The dataset was processed using our pipeline, resulting in a new dataset that we employed for training our models. This replication package comprises both our new dataset and SnowWhite's dataset.
  • Pipeline: Our pipeline has been designed to extract path sequences from Wasm binaries. We implemented our pipeline using Rust and Python.
  • Data cleaning: These scripts enable the splitting of the dataset into different variants and the creation of different input sequences.
  • Training notebooks: We have included two Jupyter notebooks, one for training a feed-forward neural network for creating code embeddings for method names, and the other for training seq2seq models with five different variants of input sequences.
  • Models: This section includes the weights of the seq2seq models trained using OpenNMT and the feedforward neural network used to generate the code embeddings.
  • Results: The log files in this section contain the evaluation results of our models, including prediction accuracy scores, BLEU scores, and other evaluation metrics.

    For more info, see README.md

Files

wasm_walker.zip

Files (4.9 GB)

Name Size Download all
md5:b1a21bedd792d32c587a226aabd7477a
4.9 GB Preview Download