# Artifact
This artifact contains the Tool `InstrSem` as well as code and detailed explanation to reproduce results in the paper which do not rely on special hardware.
Code and information for results relying on special hardware is also included but might require some additional tinkering depending on the setup.

## Setup
To setup the artifact, you need to have `docker` installed.
Further, your user must be part of the `docker` group or all scripts must be run as root (not recommended).
To setup the docker container required for evaluation, you can use the `setup_docker.sh` script:

```
bash setup_docker.sh
```

The docker container relies on QEMU to run InstrSem on different architectures.
No special hardware is necessary for the artifact evaluation.

If you want to evaluate case-studies of the paper that rely on specific hardware and you have this hardware available refer to the `Other Case Studies` section.
This artifact includes everything required to reverse-engineer GhostWrite, the undocumented Apple Mul53 extension, and undocumented P550 instructions if you have the required hardware.


## Basic Demonstration
We will start with a basic demonstration of InstrSem on different instructions of different ISAs.
For this, you can run the script `basic_demonstration.py` after setting up the docker container:

### Execution
```
python3 basic_demonstration.py
```

### Expected Output
The expected output is contained in `results/basic_demonstration.txt`.
Note that the actual output might slightly differ (e.g., order of registers for the encoding may be different).

### Explanation

The script uses InstrSem to find the semantics of Aarch64, RISC-V, Loongarch64, and x86-64 instructions.
For each instruction, the mnemonic will be printed, followed by the command used to reverse it with InstrSem.
Then, once InstrSem has terminated, the recovered semantics and information how registers and immediates are encoded are printed.

For instance, the following output is given for the Aarch64 `add x1, x2, x3` instruction:

```
 === add x1, x2, x3 === 
docker run --rm -i -v ./instr_reverser:/instr_reverser artifact_instrsem python3 main.py aarch64-socket 0x8b030041 0 32 2 500
x2 -> gpr_a: x2: 5|6|7|8|9
x1 -> gpr_b: x1: 0|1|2|3|4
x3 -> gpr_c: x3: 16|17|18|19|20

variable bits: [5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 16, 17, 18, 19, 20]
       31  30  29  28  27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12  11  10   9   8   7   6   5   4   3   2   1   0
        1   0   0   0   1   0   1   1   0   0   0                       0   0   0   0   0   0                                        
reg_a                                                                                           4   3   2   1   0                    
reg_b                                                                                                               4   3   2   1   0
reg_c                                               4   3   2   1   0                                                                
pc = (pc + 0x4)
pstate = 0x0
gpr_b = (gpr_a + gpr_c)
```

We can see that InstrSem figured out that register x2 is encoded using bits 5-9 of the instruction.
Similar, registers x1 and x3 are encoded using instruction bits 0-4 and 16-20 respectively.
All these instruction bits can be changed and the encoding semantics are still known.
Thus, these bits are thus listed as `variable bits`.

Below, InstrSem visualizes the encoding of the instruction by first displaying fixed bits (bits 10-15 and 21-31).
The fixed bits are followed by bits used to encode registers (reg\_a using bits 5-9, reg\_b using bits 0-5, and reg\_c using bits 16-20).

Then, the generalized semantics follows.
This instruction increases the pc by 4 (`pc = (pc + 0x4)`).
`pstate = 0x0` is an artifact of our Aarch64 runner and can be ignored.
Lastly `gpr_b = (gpr_a + gpr_c)` describes that the general purpose register that is encoded using the bits of `reg_b` is obtained by adding two other general purpose registers.

### Verified Claims
This basic demonstration should verify the following claims of the paper:
* InstrSem is generic and can recover and generalize instruction semantics across ISAs
* InstrSem can uncover the semantics of undocumented Loongarch64 vector instructions (the last Loongarch64 instruction is a SIMD vector addition)


## RISC-V Integer Instructions
To replicate the results of our evaluation on the RISC-V integer instructions (Table 3 in the paper), you can run the `eval_riscv64.py` script.
The script will output runtime relevant information (semantics of currently reversed instruction etc.) in `stderr` and create the table in markdown format in `stdout`.
We recommend running it the following way:

### Execution
```
python3 -u eval_riscv64.py > table_rv64.md
```

### Expected Output
It should output the semantics of instructions to stderr.
Once it is done executing, the table in `table_rv64.md` should roughly match Table 3 in the paper (also in `results/eval_riscv64.md`).
The column indicating success should be equivalent.
The column indicating runtime should be similar.
Because of some optimizations that were missing during our evaluation in the paper and differences in hardware, slightly different runtimes are possible.
However, we do not expect the runtimes to be significantly higher than the table in the paper unless you are runnning a weak CPU (the table in the paper was created using a `Ryzen 7 5700U`).

### Explanation
The script produces a table which contains one row for each tested instruction.
For all 64-bit RISC-V Integer instructions (except `jalr`, more on that later), we supply exactly one encoding to InstrSem and expect that it can recover and generalize the full semantics.
For `jalr` two candidates (one with odd and one with even immediate) are required for InstrSem to cover the whole semantics (for each, one bit of the immediate is not generalized).
The table contains the following columns:
* `Encoding`: The encoding that was supplied to InstrSem
* `Works`: Whether correcct semantics were obtained and generalized
* `Time`: Runtime of InstrSem
* `Bits`: Amount of bits in the instruction encoding that are variable, i.e., encode a register or immediate
* `Depth`: Parameter passed to InstrSem to limit the maximum size of syntax trees for semantics recovery
* `Samples`: Parameter passed to InstrSem to decide how many input-output samples are collected 

### Verified Claims
* InstrSem can recover 34 of 38 64-bit RISC-V integer instructions


## x86-64 Instructions

In the paper, we show that `InstrSem` is also applicable to complex ISAs like x86 by building a very basic x86-64 backend.
We use it on the 20 most common mnemonics we encounter in the libc.
To run the same evaluation, you can use the `eval_x86-64.py` script:

### Execution
```
python3 -u eval_x86-64.py > table_x86-64.md
```

### Expected Output
We expect the recovered table to match the table in `results/eval_x86-64.md` apart from minor runtime differences.

### Explanation
The table is created like the one for `eval_riscv64.py` and contains the same information.
However, the `Bits` column of the table is not correct for failing instructions.

### Verified Claims
* InstrSem is also applicable to complex ISAs like x86-64
* More concretly: InstrSem correctly recovers the semantics of 13 of 20 x86-64 instructions most commonly found in libc with a minimal x86-64 backend


## Full Loongarch Instruction Space

In the paper, we use InstrSem on the full loongarch64 encoding space on QEMU and on hardware (Loongson 3A5000 CPU).
This requires first creating a bitmap of all instructions that exist.
As this bitmap is slow to create and compresses well, it is included with the artifact.
Then, a random instruction from the encoding space is chosen and supplied to InstrSem.
If InstrSem is able to reverse-engineer and generalize it, all bits covered by the instruction encoding are cleared in the bitmap and the result is written to `instr_reverser/out` in a file named after the instruction encoding.

### Execution
You can run this tool using:

```
python3 eval_loongarch64.py
```

We encourage you to run this evaluation for 10h but shorter or longer runs are also possible.
When you are done, you can use CTRL+C to terminate the evaluation.
This might still lead to an active docker container which you can kill.
You can also interrupt the evaluation using CTRL+C (only shortly press it to raise a single KeyboardInterrupt).
Then when you later start the skript again, it will continue instead of starting from scratch.

### Expected Output
We expect the tool to create a multitude of files in the `out` directory.
Each file is named after the encoding that was supplied to InstrSem and contains the recovered instruction semantics.
We further expect that multiple instruction semantics are reported in stdout.
Additionally, whenever an instruction is found, the script prints the runtime (in seconds) and total number of recovered encodings to stdout.
If you run the evaluation for 10 hours, we expect at least 600 million instruction encodings to be covert.

### Explanation
The script runs InstrSem on random instructions of the Loongarch64 encoding space.
If semantics are found, they are printed to stdout and stored in a file in the `out` directory.

### Verified Claims
* InstrSem can be run on a full RISC instruction set
* The amount of covered encodings drastically increases due too InstrSem's generalization step


## QEMU Crashes
In the paper we claim that InstrSem finds QEMU crashes.
We added an additional script that triggers a QEMU crash:

### Execution
You can run InstrSem on the crashing instruction using:

```
python3 eval_qemu_crash.py
```

### Expected Output

We expect multiple QEMU crashes:

```
(...)
**
ERROR:../target/loongarch/tcg/insn_trans/trans_vec.c.inc:3574:vldi_get_value: code should not be reached
Bail out! ERROR:../target/loongarch/tcg/insn_trans/trans_vec.c.inc:3574:vldi_get_value: code should not be reached
(...)
```

Finally, InstrSem gives up on reversing because sample collection failed too many times:

```
(...)
Exception: too many sample collections failed!
```

### Explanation
The instruction encoding implemented in QEMU is too lenient and allows illegal variants of an instruction to decode.
Then, when the instruction is emulated, the illegally decoded instruction cannot be handled and an assertion is triggered.
The issues were reported to QEMU maintainers and are fixed on the `master` branch.

### Verified Claims
* InstrSem triggers QEMU crashes


## Other Case Studies
Below is information to run case studies that require specific hardware.
As this relies on specific hardware and some additional modifications might be required depending on the software stack, we do not expect artifact reviewers to run all of them and reproduce all of our results.
The above artifacts should already verify the main claims of our paper.
If you have access to the required hardware, feel free to run a subset of the case studies below.
As we cannot control or provide the full environment for the hardware, it might not be as straight-forward as the architecture independent case studies.

### Reversing GhostWrite
In out paper, we reverse-engineer the GhostWrite instruction encodings.
GhostWrite is an undocumented unprivileged instruction on T-Head C910 CPUs that writes directly to physical memory.
For this, we patch the riscv-64 runner to map parts `/dev/mem` to memory as an identity mapping (thus simulating physical memory).
This patch can be found in `ghostwrite-runner.patch`.
It has to be applied to `instr_reverser/backend/runner_socket.c`.
Then you can re-compile the riscv-64 runner using `make riscv64-socket` in `instr_reverser/backend`.
Note that the runner should now no longer be used for other instructions!
To not write to memory regions not contained in the identity mapping, we further added constraints to all riscv64 general purpose registers.
This can be done by changing line 19 of `instr_reverser/backend/riscv64.py` from:

```
            instr_rev.Register(REG_NAMES[i], 64, [], [(5, i)], encoding_group="gpr") for i in range(32) # general purpose registers
```

to
```
            instr_rev.Register(REG_NAMES[i], 64, [solver.create_less_constraint(REG_NAMES[i], 64, MAPPING_END - 0x4000), solver.create_greatereq_constraint(REG_NAMES[i], 64, MAPPING_START + 0x4000)], [(5, i)], encoding_group="gpr") for i in range(32) # general purpose registers
```

Where `MAPPING_START` and `MAPPING_END` should be replaced by start and end addresses of the physical memory mapping.
(See `runner_socket.c` after patching. You may need to adjust those ranges depending on your hardware)

With these patches, you can start the reverser using the remote runner (`riscv64-socket-remote`):

```
docker run --rm -it -p 13372:13372 -v ./instr_reverser:/instr_reverser artifact_instrsem python3 main.py riscv64-socket-remote 0b00010000000000001000000100100111 13372 32 2 300
```

(Encoding for GhostWrite: `0b00010000000000001000000100100111`)

This will listen on port 13372 for a runner to connect.

Then, you start the `instr_reverser/backend/runner_socket_riscv64` on the affected riscv machine by supplying the ip and port of the remote runner as integers (i.e., `192.168.178.2` should be supplied as `0xc0a8b202`).
This should connect the runner to the reverser and the reverser should start working.
In some cases, the runner might crash.
Thus, it can be helpful to start the runner in an infinite loop on the affected device.

If you also want InstrSem to output the set of bits that can be changed without modifying the instruction semantics, you have to uncomment line 616 in `instr_reverser/clusterer.py`:

```
        flippy_bits = self._cluster_alias(instruction, semantics, initial_state)
```

Note that, depending on the kernel version, the signal handling structs might differ.
If this is the case, deeper modifications to the runner code are required.

### Reversing Apple's Mul53 Extension
The Apple M1 and M2 CPUs contain an undocumented extension for 53-bit simd multiplication.
This can be reversed if a Apple M1 device running Asahi Linux is available.
For this, no patches to the runner are necessary.
You can use a similar remote runner setup as for GhostWrite.
Instead of `riscv64-socket-remote`, `aarch64-vector-socket-remote` should be used:

```
docker run --rm -it -p 13372:13372 -v ./instr_reverser:/instr_reverser artifact_instrsem python3 main.py aarch64-vector-socket-remote 0b00000000001000000000010000100010 13372 32 3 300
```

(Encoding for `mul53hi.2d`: `0b00000000001000000000010000100010`, encoding for `mul53lo.2d`: `0b00000000001000000000000000100010`)

The `instr_reverser/backend/runner_socket_aarch64-vector` binary should be executed on the affected machine the same as `instr_reverser/backend/runner_socket_riscv64` for reversing GhostWrite.

For `mul53hi.2d` we limit the vector registers to 53 bits.
This can be done by modifying line 21 of `instr_reverser/backend/aarch64.py` from:

```
                instr_rev.Register(f"v{i}", 16*8, [], [(5, i)], encoding_group="vec") for i in range(32)
```

to:

```
                instr_rev.Register(f"v{i}", 16*8, [solver.create_less_constraint(f"v{i}", 16*8, 1 << 53)], [(5, i)], encoding_group="vec") for i in range(32)
```

Note that, depending on the kernel version, the signal handling structs might differ.
If this is the case, deeper modifications to the runner code are required.


### Reversing SiFive P550 undocumented Instructions
The SiFive P550 CPU contains undocumented instructions including minimum or maximum instructions.
These can be reverse-engineered using InstrSem with the default riscv64 runner.

For this, the remote setup has to be used.
The reverser is executed using the following command:

```
docker run --rm -it -p 13372:13372 -v ./instr_reverser:/instr_reverser artifact_instrsem python3 main.py riscv64-socket-remote 0b01011100000100010001000110001011 13372 32 2 300
```

(encoding for signed maximum: `0b01011100000100010001000110001011`)

It will then listen on port 13372 for the runner which has to be started on the affected device and passed the ip and port of the reverser as integers (see Reversing GhostWrite for more information).

Note that, depending on the kernel version, the signal handling structs might differ.
If this is the case, deeper modifications to the runner code are required.
