Static Stack-Preserving Intra-Procedural Slicing of WebAssembly Binaries

Quentin Stiévenart; David W. Binkley; Coen De Roover

doi:10.5281/zenodo.5821007

Published January 5, 2022 | Version 1.0

Dataset Open

Static Stack-Preserving Intra-Procedural Slicing of WebAssembly Binaries

1. Vrije Universiteit Brussel, Belgium
2. Loyola University Maryland, USA
3. Vrije Universiteit Brussel

# About this artifact
This artifact contains the implementation and the results of the evaluation of a
static slicer for WebAssembly described in the ICSE 2022 paper titled "Static
Stack-Preserving Intra-Procedural Slicing of WebAssembly Binaries".

The artifact contains a docker image (`wassail-eval.tar.xz`) that contains
everything necessary to reproduce our evaluation, and the actual data resulting
from our evaluation:
1. The implementation of our slicer (presented in Section 4.1) is included in
   the docker machine, and is available publicly here:
   https://github.com/acieroid/wassail/tree/icse2022
2. Test cases used for our evaluation of RQ1 are included in the docker machine
   and in the `rq1.tar.xz` archive.
3. The dataset used in RQ2, RQ3, and RQ4 is included in the docker machine.
4. The code needed to run our evaluation of RQ2, RQ3, and RQ4 is included in the
   docker machine.
5. The scripts used to generate the statistics and graphs that are included in
   the paper for RQ2, RQ3, and RQ4 are included in the docker machine and as the
   `*.py` files in this artifact.
6. The data of RQ5 that has been used in our manual investigation is included in
   the docker machine and in the `rq5.tar.xz` archive, along with
   `rq5-manual.txt` detailing our manual analysis findings.

# How to obtain it
Our artifact is available on Zenodo at the following URL: https://zenodo.org/record/5821007

# Setting up the Docker image
## Downloading The Artifact
The artifact is available at the following URL: https://zenodo.org/record/5821007

## Loading The Docker Image
Once the artifact is downloaded in the file `icse2022slicing.tar.xz`, it can be extracted and loaded into Docker as follows (this takes a few minutes):
```
docker import icse2022slicing.tar.xz
```
To simplify further commands, you can tag the image using the printed sha256 hash of the image: if the `docker import` command resulted in the hash `54aa9416a379a6c71b1c325985add8bf931752d754c8fb17872c05f4e4b52ea2`, you can run:
```
docker tag 54aa9416a379a6c71b1c325985add8bf931752d754c8fb17872c05f4e4b52ea2 wassail-eval
```

Once the Docker image has been loaded, you can run the following commands to
obtain a shell in the appropriate environment:
```
docker volume create result
docker run -it -v result:/tmp/out/ wassail-eval bash
su - opam
```

# Reproducing results of RQ1
Our manual translations of the "classical" examples are included in the `rq1/`
directory (available in the docker image and in `rq1.tar.xz`). We
include the slices computed by our implementation in the `rq1/out/` directory.

A slice can be produced for each example in the docker image as follows, where
the first argument is the name of the program being sliced, the second the
function index being sliced, the third the slicing criterion (indicated as the
instruction index, where instructions start at 1), and the last argument is the
output file for the slice:

```
cd rq1/
wassail slice scam-mug.wat 5 8 scam-mug-slice.wat
wassail slice montreal-boat.wat 5 19 montreal-boat-slice.wat
wassail slice word-count.wat 1 41 word-count-slice1.wat
wassail slice word-count.wat 1 43 word-count-slice2.wat
wassail slice word-count.wat 1 39 word-count-slice3.wat
wassail slice word-count.wat 1 45 word-count-slice4.wat
wassail slice word-count.wat 1 37 word-count-slice5.wat
wassail slice agrawal-fig-3.wat 3 38 agrawal-fig-3-slice.wat
wassail slice agrawal-fig-5.wat 3 37 agrawal-fig-5-slice.wat
```

The slice results can then be inspected manually, and compared with the original
version of the .wat program to see which instructions have been removed, or with
the expected solutions in the `out/` directory, e.g. by running:
```
diff word-count-slice1.wat out/word-count-slice1.wat
```
(No output is expected if the slice is correct)

# Reproducing results of RQ2, RQ3, and RQ4
For these RQ, we include the data resulting from our evaluation, but we also
allow reviewers to rerun the full evaluation if needed. However, such an
evaluation requires a heavy machine and takes quite some time (4-5 days to run
to completion with a 4 hours timeout). In our case, we used a machine with 256
GB of RAM and a 64-core processor with HyperThreading enabled, allowing us to
run 128 slicing jobs in parallel.

## Runnig the Evaluation
We explain how to run the full evaluation, or only a partial evaluation below.
One can directly skip to the next section and reuse our raw evaluation results,
provided alongside this artifact.

### Running the Full Evaluation
In order to reproduce our evaluation, you can run the following commands in the
docker image. It is recommended to run them in a tmux session if one wants to
inspect other elements in parallel (tmux is installed in the docker image). The
timeout (set to 4 hours per binary, like in the paper) can be decreased by
editing the `evaluate.sh` script (vim is installed in the docker image).

This is expected to take 2-3 days of time, on a machine with 128 cores.
In order to produce only partial results, see the next section.

```
cd filtered
cat ../supported.txt | parallel --bar -j 128 sh ../evaluate.sh {}
```

The results are outputted in the `/tmp/out/` directory.

### Running a Partial Evaluation
If one does not have access to a high-end machine with 128 cores nor the time to
run the full evaluation, it is possible to produce partial results. To do so,
the following commands can be run. This will run the evaluation on the full
dataset in a random order, which can be stopped early to represent a partial
view of our evaluation, on a random subset of the data. In order to gather more
datapoints, it is also advised to decrease the timeout in the `evaluate.sh`
file, for example to 20 minutes by setting `TIMEOUT=20m` with `nano
evaluate.sh`. The number of slicing jobs running in parallel can also be
decreased to match the number of processors on the machine running the
experiments (the `-j 128` argument in the following command runs 128 parallel
jobs)

```
sudo chown opam:opam /tmp/out/
cd filtered
shuf ../supported.txt | parallel --bar -j 128 sh ../evaluate.sh {}
```

The evaluation results will be stored in the `/tmp/out/` directory.

### Skipping the Evaluation Run
Instead of rerunning the evaluation, one can rely on our full results included
in the `data.txt.xz` and `error.txt.xz` archives. These can simply be downloaded
from within the Docker machine and extracted in `/tmp/out/`:

```
cd /tmp/out/
wget https://zenodo.org/record/5821007/files/data.txt.xz
wget https://zenodo.org/record/5821007/files/error.txt.xz
unxz data.txt.7z
unxz error.txt.7z
```

## Processing the data

In order to process this data, we included multiple python script.
These require around 100GB of RAM to load the full dataset in memory.
The scripts should be run with Python 3.
When running this in the docker image, first run `cd /tmp/out/ && cp /home/opam/*.py ./`
- To count the number of functions sliced, run `cut -d, -f 1,2 data.txt | sort
-u | wc -l`. This takes around 6 minutes to run on the full dataset.
- To count the total number of slices encountered, run `wc -l data.txt
error.txt`. This takes around 15 seconds to run.
- To count the number of errors encountered, run `wc -l error.txt`. This takes
around 1 second to run.
- To produce data and graphs regarding the sizes and timing, run `python3
statistics-and-plots.py`. This will output the statistics presented in the
paper, along with Figure 2 (rq2-sizes.pdf) and Figure 3 (rq2-times.pdf). This
script takes around 35 minutes to run.
- To find the executable slices that are larger than the original programs, run
`python3 larger-slices.py > larger.txt`. This script takes around 2h30 to
run. It will list the slice using the notation `filename function-sliced
slicing-criterion` in the larger.txt file, from which the slice can be
recomputed by running `wassail slice function-sliced slicing-criterion
output.wat` in the docker image. It will also output statistics regarding
these slices, which you can easily inspect by running `tail larger.txt`.
- To investigate slices that could not be computed, run:
```
sed -i error.txt -e 's/annotation,/annotation./'
python3 errors.py
```
This will take a few seconds to run and will print a summary of the errors
encountered during the slicing process, and requires some manual sorting to map
to the categories we discuss in the paper. Here is a summary of the errors
encountered and their root cause:

### Root Cause: Unsupported Usage of br_table
Error: (Failure"Invalid vstack when popping 2 values")
Error: (Failure"Spec_inference.drop: not enough elements in stack")
Error: (Failure"Spec_inference.take: not enough element in var list")
Error: (Failure"unsupported in spec_inference: incompatible stack lengths (probably due to mismatches in br_table branches)")
### Root Cause: Unreachable Code
Error: (Failure"Unsupported in slicing: cannot find an instruction. It probably is part of unreachable code.")
Error: (Failure"bottom annotation")
Error: (Failure"bottom annotation. this an unreachable instruction")

# RQ5: Comparison to Slicing C Programs
For this RQ, we include the following data in the `rq5.7z` archive, and in the `rq5/` directory in the docker image:
- The slicing subjects in their C and textual wasm form in `rq5/subjects/`
- The CodeSurfer slices in their C and textual wasm form in `rq5/codesurfer/`
- Our slices in their wasm form in `rq5/wasm-slices/`

As this RQ requires heavy manual comparison, we do not expect the reviewers to
reproduce all of our results. We include a summary of our manual investigation
in `rq5-manual.txt`. In order to validate these manual findings, one can for
example inspect a specific slice. For example, the following line in
`rq5-manual.txt`:

```
adpcm_apl1_565_expr.c.wat INTERPROCEDURAL
```

can be validated as follows:
```
cd ~/
# This generates a trimmed down version of the CodeSurfer slice, only containing the function of interest
wassail count-in-slice rq5/codesurfer/adpcm_slices/adpcm_apl1_565_expr.c.wat slice.wat
# This compares the CodeSurfer slice with our slice
diff --side-by-side slice.wat rq5/adpcm_apl1_565_expr.c.wat
```

In this case, most extraneous instructions are present in the CodeSurfer slices,
at the end of the function. This indicates that these are present in order to
preserve interprocedural behavior, which corresponds to the `INTERPROCEDURAL`
tag in the `rq5-manual.txt`

Files

rq5-manual.txt

Files (10.5 GB)

Name	Size	Download all
data.txt.xz md5:43e3af377c1769083cb497e00adbf9bd	8.6 GB	Download
error.txt.xz md5:59ed0a3b812b67bcc79e2b3e6fb157f6	202.3 kB	Download
icse2022slicing.tar.xz md5:fbfb1ee9b7d483ecc456b65dc3d5bb22	2.0 GB	Download
larger-slices.py md5:89a9af38667e87ac64a99946cc015483	1.5 kB	Download
rq1.tar.xz md5:8f48cbcea49541718c6ef4f35adc2876	2.2 kB	Download
rq5-manual.txt md5:294d89a12b07bfebc275bb087d82463d	4.5 kB	Preview Download
rq5.tar.xz md5:da3a2dd10badafd10505bf649c05ade9	3.4 MB	Download
statistics-and-plots.py md5:8de4d431cb9d3b5857cd03f818ad178e	5.8 kB	Download

Additional details

Is supplement to: Conference paper: 10.1145/1122445.1122456 (DOI)

	All versions	This version
Views	363	361
Downloads	657	657
Data volume	3.1 TB	3.1 TB

Static Stack-Preserving Intra-Procedural Slicing of WebAssembly Binaries

Authors/Creators

Description

Files

rq5-manual.txt

Files (10.5 GB)

Additional details

Related works