Published 2024
| Version 1.3
Software
Open
WasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysis
Authors/Creators
Description
This repository houses the official replication package for the paper titled "WasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysis". The package contains the following components:
- Dataset: The dataset used in this study includes WebAssembly (wasm) binaries compiled by SnowWhite, which were used to generate two path-based code representations. The dataset was processed using our pipeline, resulting in a new dataset that we employed for training our models. This replication package comprises both our new dataset and SnowWhite's dataset.
- Pipeline: Our pipeline has been designed to extract path sequences from Wasm binaries. We implemented our pipeline using Rust and Python.
- Data cleaning: These scripts enable the splitting of the dataset into different variants and the creation of different input sequences.
- Training notebooks: We have included two Jupyter notebooks, one for training a feed-forward neural network for creating code embeddings for method names, and the other for training seq2seq models with five different variants of input sequences.
- Models: This section includes the weights of the seq2seq models trained using OpenNMT and the feedforward neural network used to generate the code embeddings.
- Results: The log files in this section contain the evaluation results of our models, including prediction accuracy scores, BLEU scores, and other evaluation metrics.
For more info, see README.md
Files
wasm_walker_1_3.zip
Files
(4.8 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:95531c8fcfda4fec2d71285d88f1818e
|
4.8 GB | Preview Download |