# FuzzFactory: Artifact Evaluation This document has the following sections: - **Getting started** (~10 human-minutes): Make sure that the provided Docker container runs on your system. - **Part one: Validating claims in the paper** (~30 human-minutes + ~10 compute-minutes): Analyze pre-baked results of the full experiments, which were generated on the authors' machine in roughly 2 compute-years. Run scripts to produce the figures used in the paper. You can also use these instructions on the results from part two to produce figures for your own fresh-baked experiments, which should approximate the figures in the paper (though that will take a bit longer). - **Part two: Running fresh experiments** (~10 human-minutes + ~2 compute-hours): Run a short version of the experiments to quickly get a fresh-baked subset of the evaluation results. *Optional*: Add 10-12 compute-hours for a better quality approximation. The full evaluation takes ~2 compute-years. - **Part three: Reuse beyond paper** (~10 human-minutes): Run FuzzFactory on a small standalone program, which serves as a demo for re-use in custom test targets. ## Getting-started ### Requirements * You will need **Docker** on your system. You can get Docker CE for Ubuntu here: https://docs.docker.com/install/linux/docker-ce/ubuntu. See links on the sidebar for installation on other platforms. You will need about 16GB of free disk space + about 8GB of free memory to run the container and experiments. ### Load image To load the artifact on your system, run: ``` docker load -i fuzzfactory-artifact.tar.gz ``` ### Run container Run the following to start a container and get a shell in your terminal: ``` docker run --name fuzzfactory --cap-add SYS_PTRACE -it rohanpadhye/fuzzfactory-artifact ``` The remaining sections of this document assume that you are inside the container's shell, within the default directory `/fuzzfactory-artifact`. You can exit the shell via CTRL+C or CTRL+D or typing `exit`. This will kill running processes, if any, but will preserve changed files. You can re-start an exited container with `docker start -i fuzzfactory`. Finally, you can clean up with `docker rm fuzzfactory`. **Note**: The Docker container will need internet access to download sources of the benchmarks used in our evaluation. We do not distribute benchmark source code with our artifact. ### Container filesystem The default directory in the container, `/fuzzfactory-artifact`, contains the following contents: - `README.txt`: This file. - `afl`: This is AFL v2.52b. - `fuzzfactory`: This is FuzzFactory, our extension of AFL v2.52b that implements domain-specific fuzzing. - `fuzzer-test-suite`: This is [Google's fuzzer-test-suite](https://github.com/google/fuzzer-test-suite), which we augmented with some extra build scripts to support validity fuzzing (domain: `valid`) and incremental fuzzing (domain: `diff`) as described in the paper. - `scripts`: Contains various scripts used for running experiments and generating figures from the paper. - `pre-baked`: Contains results of the experiments that were run on the authors' machine, which took almost 2 CPU-years to generate. - `fresh-baked`: This will contain the results of the experiments that you run, after following Part Two. - `demo`: Contains a toy program with instructions for how to compile with one or more domains implemented with FuzzFactory. - `libarchive-2019-03-31`: A pre-built binary of `libarchive`'s March 2019 development version. This is used to validate the claim that we found a memory leak in the latest version at the time of paper submission. ## Part One: Validating claims in paper ### Claims supported by the artifact - Section 4: "our proposed framework enables us to easily guide fuzz testing towards domain-specific objectives without changing the underlying fuzzing algorithm" - The artifact contains source code that implements the six domains described in Section 4 (Tables 2-7). See section **Validation A** below. - The artifact allows replicating the experiments in Section 4. In particular, the artifact allows reproducing the plots in Figures 2,3,4,6,7,8,9. See section **Validation B** below. - Section 4.7: "In addition, our experiments with cmp-mem led to the discovery of a memory leak in libarchive.". See section **Validation C** below. ### Claims not supported by the artifact - "It took one of the authors of this paper between 1 hour to 2 days to implement each of these domains". Since we did not perform a user study, this claim cannot be validated. As part of our revision (and meeting the conditions of paper acceptance), we intend to replace this sentence and instead list the lines-of-code for each of the six domains. The LoC is covered as part of **Validation A**. ### Validation A This section validates the claim that the six domains described in the fuzzer can be implemented without modifying the underlying search algorithm. For each domain, we only need to implement: (1) a compile-time LLVM instrumentation pass, which injects calls to FuzzFactory's API in the test program, and (2) a run-time library that is linked-in with the instrumented test program, which allocates any globals and defines any functions required by the domain. To validate this, please see the following files in `fuzzfactory/llvm_mode`. These files essentially inject calls to the FuzzFactory API (Fig. 10) via LLVM IR instrumentation or via calls to C macros defined in `fuzzfactory/include/waypoints.h`. - Domain `slow`: - `waypoints-slow-pass.cc`: Implements domain `slow` described in Table 2. - `waypoints-slow-rt.c`: Allocates DSF map for `slow`. - Domain `perf`: - `waypoints-perf-pass.cc`: Implements domain `perf` described in Table 3. - `waypoints-perf-rt.c`: Allocates DSF map for `perf`. - Domain `mem`: - `waypoints-mem-pass.cc`: Implements domain `mem` described in Table 4. - `waypoints-mem-rt.c`: Allocates DSF map for `mem`. - Domain `valid`: - `waypoints-valid-pass.cc`: Implements domain `valid` described in Table 5. - `waypoints-valid-rt.c`: Allocates DSF map for `slow` and defines the logic for when the argument to `ASSUME()` is `false`. - Domain `cmp`: - `waypoints-cmp-pass.cc`: Implements domain `cmp` described in Table 6. Instead of performing the bit-counting logic within the instrumentation itself, we instead insert calls to integer-size-arppropriate wrapper functions. For example, the instruction `a == b` in the test program (for 32-bit integers) is replaced with the expression `wrapcmp_eq32(a, b)`. Most of the code in the instrumentation pass deals with figuring out the correctly-typed wrapper function to call. - `waypoints-cmp-rt.c`: Allocates DSF map for `cmp`, as well as defines all the `wrapcmp` functions that perform the common-bit-counting and update the DSF map accordingly. Most of the code in this file deals with bit-counting the correctly sized arguments. - Domain `diff`: - `waypoints-diff-pass.cc`: Implements domain `diff` described in Table 7. A lot of this code deals with determining if a given basic block covers a line number given in a seaprate config file, using LLVM's debug info. - `waypoints-diff-rt.c`: Allocates globals used by domain `diff`. As part of our conditional acceptance and proposed revision, we will be including LoC for each of these domains. The LoC can be verified via the following command: ``` $ wc -l fuzzfactory/llvm_mode/waypoints* | sort -n 2 fuzzfactory/llvm_mode/waypoints-mem-rt.c 2 fuzzfactory/llvm_mode/waypoints-perf-rt.c 2 fuzzfactory/llvm_mode/waypoints-slow-rt.c 10 fuzzfactory/llvm_mode/waypoints-valid-rt.c 15 fuzzfactory/llvm_mode/waypoints-slow-pass.cc 16 fuzzfactory/llvm_mode/waypoints-valid-pass.cc 17 fuzzfactory/llvm_mode/waypoints-diff-rt.c 17 fuzzfactory/llvm_mode/waypoints-perf-pass.cc 32 fuzzfactory/llvm_mode/waypoints-mem-pass.cc 158 fuzzfactory/llvm_mode/waypoints-diff-pass.cc 163 fuzzfactory/llvm_mode/waypoints-cmp-pass.cc 295 fuzzfactory/llvm_mode/waypoints-cmp-rt.c ``` ### Validation B This section explains how to analyze the results in `pre-baked`, which have been provided with the artifact, to produce Figures 2-4 and 6-9 in the paper. You can follow the same steps with the results of your own `fresh-baked` experiments in part two, as well. For each figure we have provided a script `./scripts/generate_figure_X.sh`. This script: 1) Generates plottable CSVs from raw run data 2) Generates plots from the CSVs If the plottable CSVs exist, the script skips straight to step 2. To generate the plots for the full evaluation used in the paper, run all of the following commands: ``` ./scripts/compile_gcov_all.sh pre-baked ./scripts/generate_figure_2.sh pre-baked 12 ./scripts/generate_figure_3.sh pre-baked 12 ./scripts/generate_figure_4.sh pre-baked 12 ./scripts/generate_figure_6.sh pre-baked 12 ./scripts/generate_figure_7.sh pre-baked 12 ./scripts/generate_figure_8.sh pre-baked 12 ./scripts/generate_figure_9.sh pre-baked 12 ``` These commands take as arguments the `RESULT_DIR` and the number of repetitions `REPS`. The above commands create plots in a sub-directory called `figures` inside the `pre-baked` results directory. You can do the same with `fresh-baked` results to see coverage plots for experiments that you can run following instructions in part two; just remember to provide the correct number of `REPS` (e.g. provide `1` if you run the short version of the experiments as suggested in part two). Once you run the above command, do `ls pre-baked/figures` to list the generated PDFs for the `pre-baked` results. You can copy the PDF files from the docker container to your host machine to open them in a PDF viewer. Assuming you started the container with `docker run --name fuzzfactory ...`, you can run the following command on your host: ``` docker cp fuzzfactory:fuzzfactory-artifact/pre-baked/figures/ . ``` ### Validation C This section validates the claim that we found a memory leak in the March 2019 version of libarchive at the time of paper submission, using an input generated by `cmp-mem`. We have provided a pre-built version of `libarchive` at [commit 9112ff6 on March 31, 2019](https://github.com/libarchive/libarchive/commit/9112ff6c9242204a72e8ee756fd6346a4005111f), compiled using LeakSanitizer. We can run this on input `id:001914`[*] from the first repetition of `pre-baked/libarchive-2017-01-04-cmp-mem` as follows: ``` ./libarchive-2019-03-31/libarchive-2019-03-31-fsanitize_fuzzer pre-baked/libarchive-2017-01-04-cmp-mem/results-1/queue/id\:001914* ``` The above command should raise a memory leak error by the sanitizer. We reported this bug and it was fixed by the developers after the date of paper submission: [libarchive#1165](https://github.com/libarchive/libarchive/issues/1165). [*] Note: We identified this input by running the latest version of `libarchive` at the time with all saved inputs. This is the first of several inputs that trigger the same leak. For those familiar with AFL terminology, the input `id:001914` is saved in the *queue* subdirectory instead of *crashes* because we did not compile with LeakSanitizer during fuzzing. ## Part Two: Running fresh experiments The main evaluation of this paper involves experiments with on **6 benchmark programs** on **11 configurations** (afl, afl-zero, cmp, cmp-zero, slow, perf, mem, cmp-mem, valid, afl-diff, diff) for a total of **65 configurations** (boringssl does not have a validity fuzzing front-end). The experiments can be launched via `scripts/run_all.sh`, whose usage is as follows: ``` ./scripts/run_all.sh RESULTS_DIR TIME REPS ``` Where `RESULTS_DIR` is the name of the directory where results will be saved, `TIME` is the duration of each fuzzing experiment (e.g. `30s` or `10m` or `3h`), and `REPS` is the number of repetitions to perform. The timeout for the incremental fuzzing experiment is the lesser of `TIME` or `5m`. For the experiments in the paper, we ran with TIME=`24h` and REPS=`12`, which takes **650+ days** (almost 2 years). However, you can run a subset of the experiments to get quick results. For example, the following command will take about **2 compute-hours** to run one repetition of 1-minute fuzzing sessions across all configurations and benchmarks: ``` ./scripts/run_all.sh fresh-baked 1m 1 # Takes about two hours to complete ``` The above command will save results in a directory named `fresh-baked`. Feel free to tweak options if you have more time on your hands. We've found that running experiments for `10m` each (about **10-12 hours** per repetition) can give reasonably interesting results that approximate the evaluation in the paper. Running just `1` repetition is fine: the plots will simply have no error bars. In either case, running the above script produces `65` sub-directories in `fresh-baked`, with the naming convention `$BENCH-$CONFIG/results-$ID`, where: - `BENCH` is one of `libpng-1.2.56`, `libjpeg-turbo-07-2017`, `libarchive-2017-01-04`, `vorbis-2017-12-11`, `boringssl-2016-02-12-server`, `libxml2-v2.9.2`. - `CONFIG` is one of `afl`, `afl-zero`, `cmp`, `cmp-zero`, etc. - `ID` is a number between 1 and `REPS`, inclusive. For example, the directory `fresh-baked/libpng-1.2.56-mem/results-1` will contain the results for the first run of fuzzing `libpng` with the memory allocation feedback. The pre-populated directory `pre-baked` is similar to `fresh-baked` but contains the results of our complete experiments (12 reps of 24 hours each). You can run the exact same scripts listed in Part one with `fresh-baked` to get plots for the experiments that you just ran using the commands above. **Note**: If you generate plots for very short runs (e.g. 1 minute each), then the results will look quite different from the paper. The purpose of this section is simply to demonstrate how the fuzzing experiments can be launched from scratch. ## Part Three: Reuse beyond paper The directory `demo` contains a single-file test program (`demo.c`) to illustrate the use of FuzzFactory. Please switch to this directory for the remainder of this section. ``` cd demo ``` Background: This is how you would compile `demo.c` with regular AFL: ``` ../afl/afl-clang-fast demo.c -o demo ``` This is how you would compile `demo.c` with FuzzFactory using the `mem` domain: ``` WAYPOINTS=mem ../fuzzfactory/afl-clang-fast demo.c -o demo ``` This is how you would compile `demo.c` with FuzzFactory using the `cmp` domain: ``` WAYPOINTS=cmp ../fuzzfactory/afl-clang-fast demo.c -o demo ``` This is how you would compile `demo.c` with FuzzFactory using the composition of the `cmp` and `mem` domain: ``` WAYPOINTS=cmp,mem ../fuzzfactory/afl-clang-fast demo.c -o demo ``` Now, let's fuzz the demo program using the seed file in the `seeds` subdirectory. The same command applies regardless of what domain was used to instrument the test program: ``` ../fuzzfactory/afl-fuzz -p -i seeds -o results ./demo ``` If you fuzzed a program that has been instrumented with `cmp`+`mem` domains, you will see the following in the AFL output before fuzzing starts: ``` [+] 2 domain-specific front-end configs received DSF 0: Start=0x000000, End=0x010000, Size=65536, Cumulator=1 DSF 1: Start=0x010000, End=0x010400, Size=1024, Cumulator=1 ``` This is an indication that the test program has registered two domain-specific feedback maps with FuzzFactory. The rest of the fuzzing session is similar to running [AFL as usual](http://lcamtuf.coredump.cx/afl). Press CTRL+C to stop fuzzing. During fuzzing, the following log file is created with verbose output about domain-specific feedback: `results/fuzzfactory.log`