# Formalisation of a new weak semantics for AuDaLa artifact - Reproducing Table 2

## Contents and Structure
* The **AlloyToCNF** folder contains the original Alloy files as used in the paper in the **AlloyToCNF/Alloy** subfolder. It furthermore contains a distribution of *Alloy 6* in the **AlloyToCNF/lib** folder, and the file *AlloyToCNF.jar*, which takes the Alloy files in the subfolder and creates CNF files from them, which are saved in the **AlloyToCNF/CNF** subfolder.
* The *z3* distribution used by the artifact to check the CNF files.
* A *Dockerfile*, from which a Docker image can be built.
* A bash-script *run.sh*, which is used to generate the information in Table 2 in the paper.
* A text file *log.txt*, which gives the output of a run of *run.sh* on the computer used to generate Table 2 for the paper. The computer uses an i9-19300K CPU and 32GB of RAM.
* A bash-script *entrypoint.sh* used by the *Dockerfile* to correctly call *run.sh*
* A license text file, granting permission to the reviewers to use, execute and modify all artifact files as found in the ZIP-file.
* A Docker image generated through the commands `docker build -t artifact .` and `docker save artifact -o artifact.tar`, built from the Dockerfile given in the artifact. It can be loaded using `docker load -i artifact.tar`.

## Context
The purpose of this artifact is to reproduce the results presented in Table 2 of the paper. Table 2 serves as an indication of the correctness of the mapping from the AuDaLa axiomatic semantics to Nvidia’s PTX architecture as stated in Theorem 3, by testing Theorem 3 for bounded pre-execution sizes 4-8. It does this for three configurations of the AuDaLa semantics: Using the Release/Acquire Semantics, using the Relaxed semantics, and a third configuration Rlx*, which uses the Relaxed Semantics with a small weakening as discussed in section 4.4 of the paper. To establish the correctness of the mapping under the configurations, both the axiomatic semantics and PTX semantics have been encoded in Alloy 6. This encoding contains two checks:
* The check “p_correct” is a sanity check, which checks given a size whether a well-formed pre-execution (according to Definition 10 in the paper) exists for the given size. It will not give a witness pre-execution; it only checks whether one exists.  
* The check “mapping_correct” checks theorem 3 for a configuration and a given size. That is to say, for the given size, if the AuDaLa pre-execution respecting the current configuration is well-formed and mapped to a legal PTX pre-execution, of which the witness is mapped back to AuDaLa, then this witness is a legal execution of the AuDaLa pre-execution. When checking this, we restrict ourselves to PTX without Fences and Barriers, as those cannot be found in the AuDaLa steps, which is the context of the axiomatic semantics.  

Both of these Alloy checks are converted to CNF. Any CNF file for the “p_correct” run should be satisfiable by z3 (as that means a well-formed pre-execution exists). For the “mapping_correct” check, the CNF will be unsatisfiable if a counterexample for the mapping_correct predicate does not exist, and satisfiable if a counterexample can be found. Therefore, an unsatisfiable CNF is a positive result. Whenever z3 outputs that a CNF is satisfiable, it also gives a witness for that conclusion; as this witness is not used anywhere, we have hidden that output in the script.

## Installation and Running Instructions   
To run the experiments, first build the Dockerfile with `docker build -t artifact .` or load in the Docker image using `docker load -i artifact.tar`. Depending on your computer, this may take between 30 seconds and a few minutes. Then, do one of the following:
* Run a reduced set of experiments using `docker run artifact 0`. This will only run the experiments for pre-execution sizes 4-6. This should take less than an hour on a decent computer or laptop.
* Run the full set of experiments using `docker run artifact 1`. Note that this will take more than an hour.

The output will be shown on the command line; the satisfiability values and time values are used to create the output for Table 2 as shown below. The time values do not given as output depend on the computer the experiments ran on; to this end, we have included a log file with the output for the full set of experiments as returned on the computer we ran it on. 

We ran the experiments on WSL2 of a Windows 11 computer with an i9-19300K CPU and 32GB of RAM, resulting in the following version of Table 2, where T stands True (Theorem 3 holds) and F stands for False (Theorem 3 does not hold), rounded off to the nearest second:

| Pre-execution size | 4 | 5 | 6 | 7 | 8 |
| :--- | :---: | :---: | :---: | :---: | :---: |
|  Rel/Acq Semantics | T (0s) | T (6s) | T (1m4s) | T (16m59s) | T (6h19m53s) |  
| Relaxed Semantics | T (0s) | F (0s) | F (12s) | F (11s) | F (0s) |  
| Relaxed* Semantics | T (0s) | T (1s) | T (17s) | T (4m16s) | (1h24m47s) |


