# Mu2: Artifact Evaluation This document has the following sections: - **Getting started** (~5 human-minutes): Make sure that the provided Docker container runs on your system. Unzip the provided results. - **Part one: Validating results in the paper** (~30 human-minutes + ~10 compute-minutes): Analyze pre-baked results of the full experiments, which were generated on the authors' machine in roughly 2.5 compute-years. Run scripts to produce the figures used in the paper. You can also use these instructions on the results from part two to produce figures for your own fresh-baked experiments (though that will take a bit longer). - **Part two: Running fresh experiments** (~10 human-minutes + ~4 compute-hours): Run a short version of the experiments to quickly get a fresh-baked subset of the evaluation results. The full evaluation takes ~2.5 compute-years. - **Part three: Reuse beyond paper** (~10 human-minutes): Run Mu2 on a small standalone program, which serves as a demo for re-use in custom test targets. ## Getting-started ### Requirements * You will need **Docker** on your system. You can get Docker CE for Ubuntu here: https://docs.docker.com/install/linux/docker-ce/ubuntu. See links on the sidebar for installation on other platforms. You will need about 40GB of free disk space + about 16GB of free memory to run the container and experiments. ### Load image To load the artifact on your system, run: ``` docker load -i mu2-artifact.tar.gz ``` Then, unzip the compressed results in `pre-baked.zip`. This will require ~36GB of space. ``` unzip pre-baked.zip ``` These pre-baked results include logs of the fuzzing campaigns and mutation repro (explained in Part 3) used to generate all the results in the paper. Create a new directory called `fresh-baked` to store new results. ``` mkdir fresh-baked ``` ### Run container Run the following to start a container and get a shell in your terminal, and mount the pre-baked results: ``` docker run --name mu2 -it --mount type=bind,source="$(pwd)"/pre-baked,target=/home/mu2-artifact/pre-baked --mount type=bind,source="$(pwd)"/fresh-baked,target=/home/mu2-artifact/fresh-baked vasumv:mu2-artifact ``` The remaining sections of this document assume that you are inside the container's shell, within the default directory `/mu2-artifact`. You can exit the shell via CTRL+C or CTRL+D or typing `exit`. This will kill running processes, if any, but will preserve changed files. You can re-start an exited container with `docker start -i mu2`. Finally, you can clean up with `docker rm mu2`. ### Container filesystem The default directory in the container, `/mu2-artifact`, contains the following contents: - `README.txt`: This file. - `mu2`: This is our implementation of mutation-analysis-guided fuzzing, Mu2. - `scripts`: Contains various scripts used for running experiments and generating figures from the paper. - `pre-baked`: Contains results of the experiments that were run on the authors' machines, which took about 2.5 CPU-years to generate. - `fresh-baked`: This will contain the results of the experiments that you run, after following Part Two. ## Detailed Instructions ### Part One: Validating results in paper This section explains how to analyze the results in `pre-baked.zip`, which is provided with the artifact, to produce Figures 4-8 and Tables 2 and 3 in the paper. You can follow the same steps with the results of your own fresh-baked experiments in part two, as well. The script `./scripts/write_results_csv.py` will generate CSV data files in the `pre-baked/csv_results/` directory. To generate these CSVs, run the following: ``` python scripts/write_results_csv.py pre-baked 20 ``` Once this script is finishing running, we can generate the figures and tables. We have provided a script `scripts/generate_figures_and_tables.py`. This script will in take the results directory (e.g. `pre-baked`) and a figure or table name (e.g. `figure_4`) and read the corresponding the CSV file `pre-baked/csv_results/figure_X_raw.csv` or pre-baked/csv_results/table_X_raw.csv` To generate the plots for the full evaluation used in the paper, run all of the following commands: ``` python scripts/generate_figures_and_tables.py pre-baked figure_4 python scripts/generate_figures_and_tables.py pre-baked figure_5 python scripts/generate_figures_and_tables.py pre-baked figure_6 python scripts/generate_figures_and_tables.py pre-baked figure_7 python scripts/generate_figures_and_tables.py pre-baked figure_8 python scripts/generate_figures_and_tables.py pre-baked table_2 python scripts/generate_figures_and_tables.py pre-baked table_3 ``` The above commands create plots and table CSVs in the directory `figures_and_tables` inside the `pre-baked` results directory. You can do the same with `fresh-baked` results to see plots and tables for experiments that you can run following instructions in part two. Once you run the above command, do `ls pre-baked/figures_and_tables` to list the generated PDFs/CSVs for the `pre-baked` results. You should be able to view them on your machine, as the `pre-baked` directory was mounted to the docker container. ### Part Two: Running fresh experiments The main evaluation of this paper involves experiments with on **5 benchmark programs** on **9 configurations** (zest, mu2-default, mu2-split, mu2-random5, mu2random10, mu2random20, mu2leastexec5, mu2leastexec10, mu2leastexec20) for a total of **45 configurations**. The experiments can be launched via `scripts/run_all.sh`, whose usage is as follows: ``` ./scripts/run_all.sh TIME REPS ``` TIME` is the duration of each fuzzing experiment (e.g. `30s` or `10m` or `3h`), and `REPS` is the number of repetitions to perform. The results will be populated in the `fresh-baked` directory. For the experiments in the paper, we ran with TIME=`24h` and REPS=`20`, which takes **900 days** (almost 2.5 years). However, you can run a subset of the experiments to get quick results. For example, the following command will take about **4 compute-hours** to run one repetition of 1-minute fuzzing sessions across all configurations and benchmarks: ``` ./scripts/run_all.sh 60s 1 # Takes about 4 hours to complete ``` The above command will save results in a directory named `fresh-baked`. The pre-populated directory `pre-baked` is similar to `fresh-baked` but contains the results of our complete experiments (20 reps of 24 hours each). You can run the exact same scripts listed in Part one with `fresh-baked` to get plots for the experiments that you just ran using the commands above. **Note**: If you generate plots for very short runs (e.g. 1 minute each), then the results will look quite different from the paper. The purpose of this section is simply to demonstrate how the fuzzing experiments can be launched from scratch. ### Part Three: Reuse beyond paper The directory `mu2/examples/` contains test programs, including our benchmarks, to illustrate the use of Mu2. Please switch to this directory for the remainder of this section. ``` cd mu2/examples/ ``` Now, let's fuzz the example TimSort sorting program `src/main/java/cmu/pasta/mu2/examples/sort/TimSort.java`. The corresponding differential mutation testing driver is located in `src/test/java/cmu/pasta/mu2/examples/sort/DiffTest.java` as the method `testTimSort`. We can run the following: ``` mvn mu2:diff -Dclass=cmu.pasta.mu2.examples.sort.DiffTest -Dmethod=testTimSort -Dincludes=cmu.pasta.mu2.examples.sort.TimSort -Dtime=1m ``` The `-Dclass` and `-Dmethod` arguments refer to the test class and `@DiffFuzz` annotated test method for the program we would like to fuzz. The `-Dincludes` argument refers to the classes that we would like to create program mutants from. In this case, we specified the `TimSort` source file. The `-Dtime` argument controls the amount of time in the fuzzing campaign. Once this command is finished running, the fuzzing results will be located in `target/fuzz-results/cmu.pasta.mu2.examples.sort.DiffTest/testTimSort`. To reproduce the mutation score of the generated corpus, run the following: ``` mvn mu2:mutate -Dclass=cmu.pasta.mu2.examples.sort.DiffTest -Dmethod=testTimSort -Dincludes=cmu.pasta.mu2.examples.sort.TimSort -Dinput=target/fuzz-results/cmu.pasta.mu2.examples.sort.DiffTest/testTimSort/corpus ```