# Artifact for the paper "ROSA: Finding Backdoors with Fuzzing" (ICSE'25)

## Table of contents
1. [Purpose](#purpose)
2. [Provenance](#provenance)
3. [Data](#data)
4. [Setup](#setup)
    1. [Hardware](#hardware)
    2. [Software](#software)
5. [Usage](#usage)
    1. [Introduction](#introduction)
    2. [Testing the artifact on a basic example](#testing-the-artifact-on-a-basic-example)
    3. [Guided partial reproduction](#guided-partial-reproduction)
    4. [Unguided partial reproduction (optional)](#unguided-partial-reproduction-optional)
    5. [Complete reproduction (optional)](#complete-reproduction-optional)
    6. [Reproduction of the Stringer tool evaluation (optional)](
       #reproduction-of-the-stringer-tool-evaluation-optional)
    7. [Discovering how ROSA and ROSARUM can be reused or repurposed](
       #discovering-how-rosa-and-rosarum-can-be-reused-or-repurposed)

## Purpose
This artifact enables evaluating ROSA, a new backdoor detection tool built on top of the AFL++
fuzzer (automated test input generator), over the novel ROSARUM benchmark, made of 17 backdoors to
detect implanted in diverse programs. More precisely, this artifact makes it possible to reproduce
the evaluation results reported in our ICSE'25 paper about ROSA, as they are detailed in the Table
II and Figure 2 of this paper.

We are applying for the _Available_, (_Functional_) and _Reusable_ badges, for the following
reasons:
- **Available**: the artifact satisfies the requirements of this badge, as it has been fully
  archived on the Zenodo public archival repository with an associated unique identifier:
  <https://zenodo.org/records/14724251> (DOI: <https://doi.org/10.5281/zenodo.14724251>).
- **(Functional &) Reusable**: the artifact satisfies the requirements of these badges, as it is:
  - _Documented (carefully)_: the artifact and underlying code are thoroughly documented, covering
    not only the reproduction of the results of the paper, but also complete guides to reusing or
    extending the ROSA toolchain and ROSARUM benchmark for other uses or purposes.
  - _Consistent_: the artifact allows to recompute the data discussed in the evaluation section of
    the paper from scratch and pretty-print them as they appear in the paper. It offers both a complete
    and partial reproduction option, to allow for partial verification of the paper's results in a
    reasonable amount of time.
  - _Complete_: all of the components developed to produce the paper's results are provided. This
    includes the full ROSA toolchain (along with its development history and documentation) as well
    as the full ROSARUM benchmark (along with its development history and documentation), with full
    instructions on how to use both of them (together or separately). We also include the Stringer
    tool from other authors, to which ROSA is compared in the paper. As Stringer has a proprietary
    dependency (IDA Pro), which could not be bundled in this artifact, instructions to install it
    and integrate it into Stringer are provided instead.
  - _Exercisable_: the artifact includes documented shell and Python scripts that can either completely
    or partially reproduce the tables and diagrams of the paper, such that a reviewer may directly
    compare them with the tables and diagrams shown in the paper. While these scripts can be easily
    adapted to other uses or purposes, we also provide detailed instructions for reusing and
    repurposing both the ROSA toolchain and ROSARUM benchmark.

## Provenance
This artifact can be obtained from Zenodo:
- Zenodo: <https://zenodo.org/records/14724251>

The preprint of the associated ICSE'25 research paper can be found in this repository
([`preprint.pdf`](preprint.pdf)).

## Data
The ROSARUM benchmark used to evaluate the ROSA tool is one of the contributions of our ICSE'25
paper. ROSARUM has been bundled into this artifact and it can be seen as a backdoor dataset. A
detailed discussion of ROSARUM and how it was created can be found in the evaluation section of the
paper.

In a nutshell, ROSARUM was created in two parts:
- By reconstructing _authentic_ (real-world), publicly disclosed backdoors;
- By injecting _synthetic_ (created by us) backdoors in open-source programs.

The details of the software licenses of the various components in the ROSARUM benchmark can notably
be found at
<https://archive.softwareheritage.org/browse/content/sha1_git:dfffb14ea3592095de7ea01cbfc3dd4eb5287b08/?origin_url=https://github.com/binsec/rosarum&path=LICENSE>.

Some backdoors in ROSARUM may cause harm to the system on which they are triggered (e.g.,
encryption of the user's home folder) and should thus only be deployed either in a disposable test
environment or using an altered, harmless, version of them. Note that the scripts and Docker image
that enable reproducing the results of the paper in the artifact already take care of this
sanitization.

## Setup
### Hardware
**This artifact requires an `x86_64` machine**. Evaluating this artifact on an `ARM` machine (such
as an Apple Silicon Mac or Snapdragon PC) is not recommended. The recommended minimal configuration
to run the artifact is a recent Intel/ADM CPU with **at least 8 cores, 16 GiB of RAM and 35 GiB of
disk space (for partial reproduction) or 600 GiB of disk space (for complete reproduction)**. Complete
reproduction (i.e. reproducing the full set of experiments from the paper) takes about two months on
the recommended minimal configuration. In the absence of a higher-end configuration (i.e., a
massively parallel server), partial reproduction (i.e., partially reproducing the experiments of the
paper, like only for some backdoors in ROSARUM, with fewer and shorter ROSA runs) is the way to go
for reviewing this artifact in a reasonable amount of time.

### Software 
While ROSA, ROSARUM and the artifact scripts are all Linux-native software (they have been tested
on Ubuntu Linux 24.04), the artifact can still be deployed on all major platforms using our
**Docker image** (powered either by [Docker Desktop](https://docs.docker.com/desktop/) or
[Docker Engine](https://docs.docker.com/engine/)).

The Docker image has been tested on Ubuntu Linux 24.04, Windows 10 Home 22H2 19045.5371 and macOS
Ventura 13.2.1 with Docker Engine version 27.3.1, build `ce12230`.

#### Installing the Docker image
The image can be loaded transparently via [Docker Hub](https://hub.docker.com/):
```console
$ docker pull plumtrie/rosa-icse25-artifact:0.1.1
```
As a more durable alternative, it can also be downloaded from the Zenodo archive and then loaded
locally. For example, using [curl](https://curl.se/):
```console
$ curl https://zenodo.org/records/14724300/files/rosa-icse25-artifact_0-1-1.tar | docker load
```

Once the image has been loaded, a container can be started using the following command:
```console
$ docker run -ti --rm -p 4000:4000 --name rosa-icse25-artifact plumtrie/rosa-icse25-artifact:0.1.1
```
This will immediately start an interactive session within the container, from which you can start
using the artifact.

Note that you should also see the
message `Go to http://localhost:4000 to see the ROSA documentation.`. Indeed, since the previous
command binds the host port 4000 (i.e., on your host machine) to the guest port 4000 (i.e., inside
the container), you can consult the documentation of ROSA with a web browser on your host machine.
This is entirely optional, but it might help you understand how ROSA works.

#### Uninstalling the Docker image
Once you are done using the artifact, you can dispose of it by simply removing the Docker image.
Make sure to exit any running container(s) associated with the artifact's Docker image, and then
run:
```console
$ docker rmi plumtrie/rosa-icse25-artifact:0.1.1
```

## Usage

### Introduction
The artifact can be used in three different ways, which we summarize here and then detail in the
rest of this README.

1. [_Testing the artifact on a basic example_](#testing-the-artifact-on-a-basic-example): enables
   to quickly verify that the artifact is working on your machine;
2. _Reproducing the results of the paper_:
   - [_Guided partial reproduction_](#guided-partial-reproduction) (**requires hours of computation
     to reproduce some trends observed in the paper**): enables reproducing a subset of the results
     discussed in the paper in a reasonable amount of time, the reproduced subset is chosen by us
     and precomputed data are used for the most time-consuming part of the experiments;
   - [_Unguided partial reproduction_](#unguided-partial-reproduction-optional) (**optional,
     requires days of computation to reproduce some trends observed in the paper**): enables
     reproducing a subset of the results discussed in the paper, the reproduced subset is chosen by
     the user and no precomputed data are used;
   - [_Complete reproduction_](#complete-reproduction-optional) (**optional, requires months of
     computation or a massively parallel server**): enables reproducing all the results discussed
     in the paper;
   - [_Reproduction of the Stringer tool evaluation_](
     #reproduction-of-the-stringer-tool-evaluation-optional) (**optional, requires installing the
     IDA Pro proprietary dependency**): enables reproducing the results of evaluating the related
     Stringer tool (by other authors) over our ROSARUM benchmark;
3. [_Discovering how ROSA and ROSARUM can be reused or
   repurposed_](#discovering-how-rosa-and-rosarum-can-be-reused-or-repurposed).

**GENERAL NOTE**: if you see the following warning while reproducing our results, it just means
that the program containing the backdoor has crashed at some point, and you can safely ignore it:
> WARNING: a fuzzer has detected crashes. This is probably hindering backdoor detection!

### Testing the artifact on a basic example
Once an interactive session started within the container (see [_Installing the Docker image_](
#installing-the-docker-image)), you can start **a 1-minute ROSA detection campaign over one of the
backdoors in ROSARUM**, by running the following command:
```console
$ /root/artifact/run-target.sh sudo-backdoored 1 1
```
You should see the boot sequence of ROSA, followed by ROSA's status screen as the detection
campaign is running. After about one minute, this should be followed by a series of other output
messages, as ROSA replays the performed campaign with different input parameters (parameter sweep
study). Finally, you should see the message `PDF compiled successfully and copied to:
/root/evaluation/sudo-backdoored.pdf`. This means that the results of the performed detection
campaign and parameter sweep study have been pretty-printed into a PDF file saved in the container.
To retrieve this PDF in a directory on your host machine, run the following command **from your
host machine**:
```console
$ docker cp rosa-icse25-artifact:/root/evaluation/sudo-backdoored.pdf .
```
The PDF should look something like this:

![The generated PDF for Sudo.](./images/sudo-backdoored-pdf.png)

It is fully okay if the numbers in the table or the lines in the graph are not exactly the same. If
you have made it thus far without errors, then you have configured the artifact correctly. If there
were any errors along the way, please contact the authors.

### Guided partial reproduction
In our ICSE'25 paper, we perform an 8-hour ROSA detection campaign (a campaign uses 6 CPU cores)
for each of the 17 backdoors in the ROSARUM benchmark. To increase the statistical significance of
our results (as ROSA is based on fuzzing, which is stochastic), each campaign is repeated 10
times. A complete reproduction of this process (see [_Complete reproduction_](
#complete-reproduction)) takes months unless you massively parallelize things.

**The goal of the proposed _Guided partial reproduction_ is to enable quickly (a few hours)
reproducing the trend observed in the paper for two of the ROSARUM backdoors**. More precisely,
_Guided partial reproduction_ only performs the following tasks:
- A single 30-minute ROSA detection campaign over the Sudo backdoor from ROSARUM, which is usually
  enough time to detect the backdoor;
- A complete set of detection campaigns (10 campaigns of 8 hours each) over the D-Link/thttpd
  backdoor from ROSARUM. Yet, the most time-consuming part of these campaigns (fuzzing with the
  standard AFL++ using 6 CPU cores) is not performed, but pre-computed fuzzing data (from the
  experiments discussed in the paper) are used instead. The results of the campaigns should thus
  be exactly the same as in the paper.

Once an interactive session has been started within the container (see
[_Installing the Docker image_](#installing-the-docker-image)), you can start the _Guided partial
reproduction_ by running the following command:
```console
$ /root/artifact/run-reduced-evaluation.sh
```
This should take a few hours to complete.
When finished, the last printed message should read `PDF compiled successfully and copied to:
/root/evaluation/reduced-evaluation.pdf`.

**From your host machine**, you should run the following command to retrieve the PDF from the
container:
```console
$ docker cp rosa-icse25-artifact:/root/evaluation/reduced-evaluation.pdf .
```
You can then compare the results in the generated PDF to Table II and Figure 2 in the paper. In
particular, verify the following elements:
- Table at the top of the PDF / Table II in the paper:
    - _Sudo_ line: you should generally observe results that are in line with the corresponding
      line in the paper. Yet, since a single and shorter campaign was performed compared to what
      was done in the paper, it remains _possible_ (although unlikely) that the backdoor will not
      have been discovered before the 30-minute timeout.
    - _D-Link/thttpd_ line: since the exact fuzzer-generated inputs from the paper were used, you
      should observe the exact same results as in the paper.
    - **NOTE**: the generated lines will not contain the comparison with Stringer like in the
      paper. This is expected, and the reasons for it are described in [_Reproduction of the
      Stringer tool evaluation (optional)_](
      #reproduction-of-the-stringer-tool-evaluation-optional).
- Graph at the bottom of the PDF / Figure 2 in the paper:
    - You should observe the same trends as in the paper. Specifically, you should see the number
      of manually inspected inputs decrease and the number of failed runs increase as the duration
      of phase 1 increases. Of course, the graph will not be identical to the paper, as we are
      using a much smaller number of backdoors and campaigns.

### Unguided partial reproduction (optional)
**Unguided partial reproduction enables launching as many detection campaigns as you want, of any
custom length, over any of the backdoors in ROSARUM**, by running the following command **in the
container** with (1) the backdoor identifier, (2) the campaign length in minutes and (3) the number
of campaigns, as parameters:
```console
$ /root/artifact/run-target.sh BACKDOOR_ID MINUTES_PER_CAMPAIGN NUMBER_OF_CAMPAIGNS
```
For example, remember that the same command was used to test the artifact and launch a 1-minute
campaign over the Sudo backdoor (see [_Testing the artifact on a basic example_](
#testing-the-artifact-on-a-basic-example)):
```console
$ /root/artifact/run-target.sh sudo-backdoored 1 1
```
The correspondence between the ROSARUM backdoor names from the paper and backdoor identifiers in
the artifact is the following:
- Belkin / httpd: `belkin-backdoored` (specialized seeds), `belkin_unoptimized-backdoored` (generic
  HTTP seeds)
- D-Link / thttpd: `dlink-backdoored`
- Linksys / scfgmgr: `scfgmgr-backdoored`
- Tenda / goahead: `tenda-backdoored`
- PHP: `php-backdoored`
- ProFTPD: `proftpd-backdoored`
- vsFTPd: `vsftpd-backdoored`
- sudo: `sudo-backdoored`
- libpng: `libpng-backdoored`
- libsndfile: `libsndfile-backdoored`
- libtiff: `libtiff-backdoored`
- libxml2: `libxml2-backdoored`
- Lua: `lua-backdoored`
- OpenSSL / bignum: `openssl-backdoored`
- PHP / unserialize: `php_unserialize-backdoored`
- Poppler: `poppler-backdoored`
- SQLite3: `sqlite3-backdoored`

When finished, the last printed message should read `PDF compiled successfully and copied to:
/root/evaluation/<backdoor-id>.pdf`.

**From your host machine**, you should run the following command to retrieve the PDF from the
container:
```console
$ docker cp rosa-icse25-artifact:/root/evaluation/<backdoor-id>.pdf .
```
You can then compare the results in the generated PDF to the corresponding line from Table II in
the paper.

**As ROSA is based on fuzzing, which is stochastic, you will never be able to reproduce exactly the
same numbers as reported in the paper. What should be observed, though, is that, with a
sufficiently large number and duration of campaigns, the obtained results converge towards the
conclusions made in the paper about the performance of ROSA**.

**NOTE**: the generated PDF will not contain the comparison with Stringer like in the paper. This
is expected, and the reasons for it are described in [_Reproduction of the Stringer tool
evaluation (optional)_](#reproduction-of-the-stringer-tool-evaluation-optional).

### Complete reproduction (optional)
**Complete reproduction means running an 8-hour detection campaign 10 times, for each of the 17
ROSARUM backdoors (plus 1 additional run for Belkin/httpd with different seeds), as done in the
paper**, thus obtaining the complete Table II and Figure 2 from the paper. **It is not recommended
to run _Complete reproduction_ on a personal desktop or laptop machine** (where it will take months
to complete) but only on a sufficiently powerful cloud server, if you have one.

In order to perform the complete reproduction, **in the container**, run:
```console
$ /root/artifact/run-full-evaluation.sh
```
When finished, the last printed message should read `PDF compiled successfully and copied to:
/root/evaluation/full-evaluation.pdf`.

**From your host machine**, you should run the following command to retrieve the PDF from the
container:
```console
$ docker cp rosa-icse25-artifact:/root/evaluation/full-evaluation.pdf .
```
You can then compare the results in the generated PDF to Table II and Figure 2 in the paper.

**As ROSA is based on fuzzing, which is stochastic, you will never be able to reproduce exactly the
same numbers as reported in the paper. What should be observed, though, is that the obtained
results lead to the conclusions made in the paper about the performance of ROSA**.

**NOTE**: the generated PDF will not contain the comparison with Stringer like in the paper. This
is expected, and the reasons for it are described in [_Reproduction of the Stringer tool
evaluation (optional)_](#reproduction-of-the-stringer-tool-evaluation-optional).

### Reproduction of the Stringer tool evaluation (optional)
In our ICSE'25 paper, we compare ROSA to a state-of-the-art tool by other authors called
[_Stringer_](https://www.doi.org/10.1007/978-3-319-66399-9_28). While an implementation of Stringer
called [strngr](https://github.com/BaDSeED-SEC/strngr) does exist, it depends on [IDA Pro](
https://hex-rays.com/ida-pro), which is proprietary software and thus cannot be packaged in this
artifact. That being said, a version of strngr is still provided in the Docker container under
`/root/artifact/stringer/strngr/`, with patches to make it compatible with a more recent version of
IDA Pro (tested with IDA Pro 8.4).

If you have access to a recent version of IDA Pro, you can install it inside the Docker container.
Then, since strngr does not accept `x86_64` binaries, you need to cross-compile the ROSARUM
benchmark for an ARM-based architecture (e.g., `armhf`). There are instructions in the
[ROSARUM documentation](
https://archive.softwareheritage.org/browse/content/sha1_git:e511fc9a731fa89024dc8cebe2372fd3fa855f6c/?origin_url=https://github.com/binsec/rosarum&path=CONTRIBUTING.md)
that you can follow to do so. Once you have built the ROSARUM benchmark for an ARM-based
architecture, you can apply the suggested patches to strngr:
```console
$ cd /root/artifact/stringer/strngr && patch -p1 < ../patches/strngr-ida-pro-8-4.patch
```
Be advised that if you have a more recent version of IDA Pro you might need to provide your own
patches to make strngr work.
You can then run strngr like so:
```console
$ cd /root/artifact/stringer/strngr && cargo -q run -- --ida /path/to/idaq64 /path/to/backdoor
```
The above command should print out a report following the approach discussed in the Stringer paper.
Since there is no automatic way of evaluating this type of report, you need to do it manually.
During this manual analysis, you can consult the documentation of ROSARUM to verify if parts of
each backdoor were indeed found by strngr.

### Discovering how ROSA and ROSARUM can be reused or repurposed
[ROSA](https://github.com/binsec/rosa) and [ROSARUM](https://github.com/binsec/rosarum) are live
projects and welcome new pull requests on GitHub!

For easy deployment and testing, separate Docker images for ROSA ([Docker Hub](
https://hub.docker.com/repository/docker/plumtrie/rosa/tags/0.5.1/sha256-780b5cf9fc69c12c8f643cc81d226116f79484ef6d863e69c269bd5da03d0628))
and ROSARUM ([Docker Hub](
https://hub.docker.com/repository/docker/plumtrie/rosarum/tags/0.2.0/sha256-16d084347ae936b54a6eb32b698661f0710a1e29e3a47900ebffa3fcd9600018))
are available (these are **not** required for the evaluation of this artifact).

#### Reusing or repurposing ROSA
ROSA can be reused to detect backdoors in essentially any fuzzable `x86`/`x86_64` binary program.
You can read [ROSA's documentation on configuring](
https://archive.softwareheritage.org/browse/content/sha1_git:160dca92f7be60ac5810eeaec9a03389a888a951/?origin_url=https://github.com/binsec/rosa&path=doc/src/config_guide.md)
it for new targets.

ROSA can also be extended in the following ways:
- By plugging a new fuzzer backend to it (see [ROSA's documentation on adding new fuzzers](
  https://archive.softwareheritage.org/browse/origin/content/?origin_url=https://github.com/binsec/rosa&path=doc/src/extensions/fuzzers.md));
- By customizing the metamorphic oracle (see [ROSA's documentation on extending the oracle](
  https://archive.softwareheritage.org/browse/content/sha1_git:94a3e62f8ca287fe7d889e2f0e1f7ae2796eeb4c/?origin_url=https://github.com/binsec/rosa&path=doc/src/extensions/oracle.md));
- By using different distance metrics to compare input-trace pairs (see [ROSA's documentation on
  adding new distance metrics](
  https://archive.softwareheritage.org/browse/content/sha1_git:c8ac40a6c54a8f4f4c533eea4730be28c31b615d/?origin_url=https://github.com/binsec/rosa&path=doc/src/extensions/distance_metrics.md)).

#### Reusing or repurposing ROSARUM
ROSARUM can be reused to benchmark any backdoor detection tool.
You can read [ROSARUM's documentation on using it](
https://archive.softwareheritage.org/browse/content/sha1_git:3390d96ceff2d9b3124839e78b5739c4218615c1/?origin_url=https://github.com/binsec/rosarum&path=README.md)
for evaluating new targets.

ROSARUM can also be extended by:
- Adding new (authentic or synthetic) backdoors to it;
- Porting it to new CPU architectures, if a new detection method depends on it.

To do so, consult the [ROSARUM contributing guide](
https://archive.softwareheritage.org/browse/content/sha1_git:e511fc9a731fa89024dc8cebe2372fd3fa855f6c/?origin_url=https://github.com/binsec/rosarum&path=CONTRIBUTING.md)
to learn more about these extensions.

#### Long-term archives
For archiving purpose, the code and documentation of ROSA and ROSARUM have been bundled into this
artifact, but separate archives are also available at
<https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/binsec/rosa>
and
<https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/binsec/rosarum>.
