# Mu2: Artifact Evaluation

This document has the following sections:
- **Getting started** (~5 human-minutes): Make sure that the provided Docker container runs on your system. Unzip the provided results.
- **Part one: Validating results in the paper** (~30 human-minutes + ~10 compute-minutes): Analyze pre-baked results of the full experiments, which were generated on the authors' machine in roughly 2.5 compute-years. Run scripts to produce the figures used in the paper. You can also use these instructions on the results from part two to produce figures for your own fresh-baked experiments (though that will take a bit longer).
- **Part two: Running fresh experiments** (~10 human-minutes + ~4 compute-hours): Run a short version of the experiments to quickly get a fresh-baked subset of the evaluation results. The full evaluation takes ~2.5 compute-years.
- **Part three: Reuse beyond paper** (~10 human-minutes): Run Mu2 on a small standalone program, which serves as a demo for re-use in custom test targets. 


## Getting-started

### Requirements

* You will need **Docker** on your system. You can get Docker CE for Ubuntu here: https://docs.docker.com/install/linux/docker-ce/ubuntu. See links on the sidebar for installation on other platforms.

You will need about 40GB of free disk space + about 16GB of free memory to run the container and experiments.

### Load image

To load the artifact on your system, run:

```
docker load -i mu2-artifact.tar.gz
```

Then, unzip the compressed results in `pre-baked.zip`. This will require ~36GB of space.
```
unzip pre-baked.zip
```
These pre-baked results include logs of the fuzzing campaigns and mutation repro (explained in Part 3) used to generate all the results in the paper.

Create a new directory called `fresh-baked` to store new results.
```
mkdir fresh-baked
```

### Run container

Run the following to start a container and get a shell in your terminal, and mount the pre-baked results:

```
docker run --name mu2 -it --mount type=bind,source="$(pwd)"/pre-baked,target=/home/mu2-artifact/pre-baked --mount type=bind,source="$(pwd)"/fresh-baked,target=/home/mu2-artifact/fresh-baked vasumv:mu2-artifact 
```

The remaining sections of this document assume that you are inside the container's shell, within the default directory `/mu2-artifact`. You can exit the shell via CTRL+C or CTRL+D or typing `exit`. This will kill running processes, if any, but will preserve changed files. You can re-start an exited container with `docker start -i mu2`. Finally, you can clean up with `docker rm mu2`.

### Container filesystem

The default directory in the container, `/mu2-artifact`, contains the following contents:
- `README.txt`: This file.
- `mu2`: This is our implementation of mutation-analysis-guided fuzzing, Mu2.
- `scripts`: Contains various scripts used for running experiments and generating figures from the paper.
- `pre-baked`: Contains results of the experiments that were run on the authors' machines, which took about 2.5 CPU-years to generate. 
- `fresh-baked`: This will contain the results of the experiments that you run, after following Part Two.

## Detailed Instructions

### Part One: Validating results in paper

This section explains how to analyze the results in `pre-baked.zip`, which is provided with the artifact, to produce Figures 4-8 and Tables 2 and 3 in the paper. You can follow the same steps with the results of your own fresh-baked experiments in part two, as well.

The script `./scripts/write_results_csv.py` will generate CSV data files in the `pre-baked/csv_results/` directory. To generate these CSVs, run the following:
```
python scripts/write_results_csv.py pre-baked 20
```

Once this script is finishing running, we can generate the figures and tables. We have provided a script `scripts/generate_figures_and_tables.py`.
This script will in take the results directory (e.g. `pre-baked`) and a figure or table name (e.g. `figure_4`) and read the corresponding the CSV file `pre-baked/csv_results/figure_X_raw.csv` or pre-baked/csv_results/table_X_raw.csv` 

To generate the plots for the full evaluation used in the paper, run all of the following commands:

```
python scripts/generate_figures_and_tables.py pre-baked figure_4
python scripts/generate_figures_and_tables.py pre-baked figure_5
python scripts/generate_figures_and_tables.py pre-baked figure_6
python scripts/generate_figures_and_tables.py pre-baked figure_7
python scripts/generate_figures_and_tables.py pre-baked figure_8
python scripts/generate_figures_and_tables.py pre-baked table_2
python scripts/generate_figures_and_tables.py pre-baked table_3
```

The above commands create plots and table CSVs in the directory `figures_and_tables` inside the  `pre-baked` results directory.  You can do the same with `fresh-baked` results to see plots and tables for experiments that you can run following instructions in part two.

Once you run the above command, do `ls pre-baked/figures_and_tables` to list the generated PDFs/CSVs for the `pre-baked` results. You should be able to view them on your machine, as the `pre-baked` directory was mounted to the docker container.

### Part Two: Running fresh experiments

The main evaluation of this paper involves experiments with on **5 benchmark programs** on **9 configurations** (zest, mu2-default, mu2-split, mu2-random5, mu2random10, mu2random20, mu2leastexec5, mu2leastexec10, mu2leastexec20) for a total of **45 configurations**.


The experiments can be launched via `scripts/run_all.sh`, whose usage is as follows:

```
./scripts/run_all.sh TIME REPS
```

TIME` is the duration of each fuzzing experiment (e.g. `30s` or `10m` or `3h`), and `REPS` is the number of repetitions to perform. The results will be populated in the `fresh-baked` directory.

For the experiments in the paper, we ran with TIME=`24h` and REPS=`20`, which takes **900 days** (almost 2.5 years). However, you can run a subset of the experiments to get quick results. For example, the following command will take about **4 compute-hours** to run one repetition of 1-minute fuzzing sessions across all configurations and benchmarks:

```
./scripts/run_all.sh 60s 1  # Takes about 4 hours to complete
```

The above command will save results in a directory named `fresh-baked`.


The pre-populated directory `pre-baked` is similar to `fresh-baked` but contains the results of our complete experiments (20 reps of 24 hours each). You can run the exact same scripts listed in Part one with `fresh-baked` to get plots for the experiments that you just ran using the commands above. **Note**: If you generate plots for very short runs (e.g. 1 minute each), then the results will look quite different from the paper. The purpose of this section is simply to demonstrate how the fuzzing experiments can be launched from scratch.


### Part Three: Reuse beyond paper

The directory `mu2/examples/` contains test programs, including our benchmarks, to illustrate the use of Mu2. Please switch to this directory for the remainder of this section.
```
cd mu2/examples/
```

Now, let's fuzz the example TimSort sorting program `src/main/java/cmu/pasta/mu2/examples/sort/TimSort.java`. The corresponding 
differential mutation testing driver is located in `src/test/java/cmu/pasta/mu2/examples/sort/DiffTest.java` as the method `testTimSort`.

We can run the following:

```
mvn mu2:diff -Dclass=cmu.pasta.mu2.examples.sort.DiffTest -Dmethod=testTimSort -Dincludes=cmu.pasta.mu2.examples.sort.TimSort -Dtime=1m
```

The `-Dclass` and `-Dmethod` arguments refer to the test class and `@DiffFuzz` annotated test method for the program we would like to fuzz.
The `-Dincludes` argument refers to the classes that we would like to create program mutants from. In this case, we specified 
the `TimSort` source file. The `-Dtime` argument controls the amount of time in the fuzzing campaign.

Once this command is finished running, the fuzzing results will be located in `target/fuzz-results/cmu.pasta.mu2.examples.sort.DiffTest/testTimSort`.

To reproduce the mutation score of the generated corpus, run the following:
```
mvn mu2:mutate -Dclass=cmu.pasta.mu2.examples.sort.DiffTest -Dmethod=testTimSort -Dincludes=cmu.pasta.mu2.examples.sort.TimSort -Dinput=target/fuzz-results/cmu.pasta.mu2.examples.sort.DiffTest/testTimSort/corpus
```