by Tiago Cogumbreiro, Julien Lange, Dennis Liew Zhen Rong, and Hannah Zicarelli
This paper introduces Faial, a tool that guarantees data-race freedom (DRF) for CUDA kernels.
The artifact submission contains:
The structure of this document:
The structure of this container (also available on GitLab):
/benchmark/
: Scripts and templates comprising the benchmarking framework./datasets/
: Datasets, scripts and experimental evaluation results.
correctness/
: Data for Claim 1.micro-benchmarks/
: Data for Claim 2.gpuverify-cav14/
: Data for Claim 3./faial-coq/
: Mechanized proofs for all theoretical results./source/
: Source code for the Faial verification pipeline./tools/
: Binaries (ubuntu-amd64) for all verification tools used.First, install and start Docker.
The Docker image is available as both as a compressed tar archive and also online. Choose one of the two following methods. In both cases, the container will load an interactive terminal session at directory /artifact
; the environment variable FAIAL_HOME
pointsto this location.
Ensure you are in the root of this artifact. You should see the compressed tar archive artifact-354.tar.bz2
.
To load the Docker container from this archive, run:
docker load < artifact-354.tar.bz2 $
To enter the container with an interactive terminal session, run:
docker run -it -p 8000:8000 faial-cav21 $
To download the image from the web, run:
docker pull registry.gitlab.com/umb-svl/faial-artifact-cav21/artifact:latest $
To enter the container with an interactive terminal session, run:
docker run -it -p 8000:8000 registry.gitlab.com/umb-svl/faial-artifact-cav21/artifact:latest $
Mechanized proofs supporting theoretical results are available locally at faial-coq/
and online at GitLab.
To check proofs run make:
cd $FAIAL_HOME/faial-coq
$ make clean # Clean any already compiled proofs (optional)
$ make # Check if all proofs are compiled $
File _CoqProject
lists all files that will be compiled, thus their proofs will be checked. Below we list the file, line number, and the name of the definition/theorem, e.g., Main.v:619 theorem drf
corresponds to file faial-coq/src/Main.v
, line number 619, and theorem drf
. For your convenience, we also provide a hyperlink to the file in our GitLab repository (branch cav21
).
Main.v:619
theorem drf
Compositionality.v:472
corollary compositionality
Main.v:619
theorem drf
NExp.v:18
inductive nexp
BExp.v:14
inductive bexp
ULang.v:24
inductive inst
AExp.v:9
Access
P
$: list access_val
H
$: list list access_val
VHist.v:52
fixpoint v_app
VHist.v:46
fixpoint v_seq
\mathcal U
$: ULang.v:24
inductive Run
WLang.v:41
inductive w_inst
; see example in Main.v:699
definition Prog1
\mathcal W
$:WLang.v:104
inductive WRun
fun x y => not (AExp.access_safe x y)
Hist.v:20
definition Safe
Hist.v:25
definition MSafeStrong
ALang.v:25
inductive n_inst
ALang.v:53
fixpoint n_seq
, and ALang.v:60
definition p_seq
Align.v:16
fixpoint align
; see example in Main.v:707
definition AProg1
TLang.v:25
inductive inst
Util.v:190
definition prod
TLang.v:50
inductive Run
Sequentialize.v:32
definition trace
Main.v:20
definition split
; see example in Main.v:726
definition SProg1
Main.v
theorem drf
than Theorems 1 and 3 (in the paper) that establishes that split(align(P))
is DRF iff P
is DRF.ULang.v
is more expressive than the language of protocols defined in Fig 2. The former includes conditionals, while the latter does not.IPairIn
) --- there is one such relation per language, ie ULang, WLang, and ALang. IPairIn
abstracts away the finer details of each operational semantics, greatly simplifying the proof. To this end, in our Coq formalism we only give the semantics of aligned protocols in terms of IPairIn
. Additionally, in the Coq formalism, we only give the operational semantics of well-formed protocols $\mathcal W
$, and of symbolic traces $\mathcal T
$.ALang.n_inst
into a list of PhaseSplit.phase
and then each PhaseSplit.phase
is converted into a symbolic trace TLang.inst
. We give a full example of its usage in Main.v:707
definition AProg1
.source/faial/src/phasealign.ml
source/faial/src/phasesplit.ml
This section contains instructions on generating the data used in the paper.
The CSV data, logs, and plots used in the paper are already included in each of the respective directories. Rerunning the experiment will overwrite the results.
To visualise the generated data, the Docker container includes a HTTP server exposing $FAIAL_HOME
to port 8000
. To access the data, ensure the container is running, and open the following URL in your favourite browser (on the host machine): localhost:8000.
See Accessing Experimental Results for more details.
Warning: Rerunning the experiment will overwrite the bundled logs/figures that support the paper with your own logs/figures! Reverting to the original logs/figures is possible via a backup copy of:
datasets/correctness/results
datasets/micro-benchmarks/results
datasets/gpuverify-cav14/results
Expected runtime of this experiment: ~20 minutes.
This section details our experimental dataset, results, and procedure related to Table 1 in Claim 1: Correctness. This experiment requires manual processing! While we provide scripts to generate data, verifying the correctness of data requires manual examination.
All files relating to Claim 1: Correctness are stored in the datasets/correctness
directory.
$ cd $FAIAL_HOME/datasets/correctness
The dataset for Table 1 is split into Tests 1-5. Test 1 (one test per tool) can be found in directory {TOOL}/real-world/transposeDiagonal.cu
, eg faial/real-world/transposeDiagonal.cu
. Tests 2-5 can be found in directory {TOOL}/synthetic/{TEST}.cu
(one test per tool), example gklee/synthetic/last-iter.cu
. Each test has a DRF version and a racy version, which are distinguishable by the filename. For instance, {TOOL}/synthetic/last-iter-drf.cu
is DRF and {TOOL}/synthetic/last-iter.cu
is racy.
Automatic scripts are provided to to rerun the tools against the dataset:
$ python3 run.py --tool faial # runtime: ~5s
$ python3 run.py --tool gpuverify # runtime: ~50s
$ python3 run.py --tool pug # runtime: ~3s
$ python3 run.py --tool gklee # runtime: ~7m /!\ WARNING THIS MAY CRASH DUE TO GKLEE
$ python3 run.py --tool sesa # runtime: ~12m /!\ WARNING THIS MAY CRASH DUE TO SESA
The above commands will generate logs and a timings-{TOOL}.csv
for each tool. This data contains tool exit statuses, time and memory, paths to tool-specific kernels, paths to tool logs, and DRF or racy results from parsing logs.
A script is provided to generate a table with the data generated above:
$ python3 table.py
This table shows the results from the timing CSVs in a prettier format. The following is the output of the table script we observed in our experiment.
example expected faial gpuverify pug gklee sesa
------------------------ ---------- ------- ----------- ----- -------------- --------------
transposeDiagonal racy racy racy drf timeout timeout
transposeDiagonal-drf drf drf racy drf timeout timeout
first-iter racy racy racy racy timeout timeout
first-iter-drf drf drf racy racy timeout timeout
last-iter racy racy racy racy timeout timeout
last-iter-drf drf drf racy drf timeout timeout
last-iter-first-iter racy racy racy racy timeout timeout
last-iter-first-iter-drf drf drf racy racy timeout timeout
read-index-racy racy racy racy racy no race alarms no race alarms
read-index drf racy drf racy no race alarms no race alarms
Note that while this table displays some information wrt. racy
-ness, the validity of these results needs to be validated manually, as we explain below.
To count and verify the correctness of data-races, logs must be manually examined for each racy result. The objective of this manual analysis is to count the number of data-races reported and determine if the error traces raised by tools accurately reflect real data-races. All information provided related to the race is considered, e.g., state of local and global variables, types of accesses (read/write), source code line numbers of accesses.
For DRF test components, it is only necessary to count the reported races as they can be assumed invalid. For racy test components, it is additionally necessary to verify the correctness of each data-race.
We include a file with this analysis for each racy tool log in our results. Each analysis file is a .txt file corresponding to the .log file with tool output. For example, a data-race is reported in for Faial in faial/synthetic/read-index-racy-1.log
, and we provide an analysis of this race in faial/synthetic/read-index-racy-1.txt
.
To verify data-races in tool logs, a working understanding of the data-races in question is helpful. The paper provides context for these races through respective access memory protocols:
Test 1 is a running-example in Section 1 and a simplified protocol is shown in Listing 2.3. Additionally, Appendix A Examples 3 and 5 show tool analysis of this test.
Tests 2-5 are discussed in Claim: Correctness in Section 6. Protocols for these tests are shown in Figures 5-6.
Expected runtime of this experiment: ~1 hour (with --repeat 1
) or ~5 hours (with --repeat 5
).
This section details our experimental dataset, results, and procedure related to Figure 8 in Claim 2: Scalability.
All files relating to Claim 2: Scalability are stored in the micro-benchmarks
directory.
$ cd $FAIAL_HOME/datasets/micro-benchmarks
Tool-specific versions of the synthetic dataset used for this experiment are stored in directories respective to their tool names. To run the tools against the dataset:
$ python3 run.py --repeat 5 --tool faial # runtime: ~27m
$ python3 run.py --repeat 5 --tool pug # runtime: ~17m
$ python3 run.py --repeat 5 --tool sesa # runtime: ~7m /!\ WARNING THIS MAY CRASH DUE TO SESA
$ python3 run.py --repeat 5 --tool gklee # runtime: ~7m /!\ WARNING THIS MAY CRASH DUE TO GKLEE
$ python3 run.py --repeat 5 --tool gpuverify # runtime: ~4hr
The above commands were used to produce the results in the paper. We ran all tools 5 times on all problems. The above commands will generate a timings-{TOOL}.csv
for each tool. This data contains tool exit statuses, time and memory, paths to tool-specific kernels, paths to tool logs, and DRF or racy results from parsing logs.
The --repeat 5
option specifies the number of times the experiment will run.
Specifying the --problem {accs,barriers,ifs,nested-loops,nested-loops-sync}
option runs a subset of the synthetic protocols. By default, the entire set will be used.
The --tool={faial,gpuverify,pug,sesa,gklee}
option specifies the tools to run. By default, only Faial is ran. To repeat the experiment on all tools only once:
$ python3 run.py --repeat 1 --tool={faial,gpuverify,pug,sesa,gklee} # runtime: ~1hr
Specifying the --dry-run
option prints the command of each tool against each problem without running.
To generate the graph as in the paper, run the following command:
$ python3 ../../benchmark/benchmark-graph.py -mb
The generated graphs are named Micro-benchmark-time-1-50.pdf
and Micro-benchmark-memory-1-50.pdf
.
Expected runtime of this experiment: ~40 minutes (with --repeat 1
) or ~3.5 hours (with --repeat 5
).
This section details our experimental dataset, results, and procedure related to Figure 9 in Claim 3: Real-world usability.
All files relating to Claim 3: Real-world usability, are stored in the gpuverify-cav14
directory.
$ cd $FAIAL_HOME/datasets/gpuverify-cav14
Tool-specific versions of the synthetic dataset used for the experiment are found in directories respective to their tool names. To run each tool against the dataset:
$ python3 run.py --repeat 5 --tool faial # runtime: ~9m
$ python3 run.py --repeat 5 --tool pug # runtime: ~3m
$ python3 run.py --repeat 5 --tool gpuverify # runtime: ~3hr
The commands above were used to produce the results in the paper. We ran all tools 5 times on all kernels. The above commands will generate a timings-{TOOL}.csv
for each tool. This data contains tool exit statuses, time and memory, paths to tool-specific kernels, paths to tool logs, and DRF or racy results from parsing logs.
The --repeat 5
option specifies the number of times the experiment will be repeated. To repeat the experiment on all tools only once:
$ python3 run.py --repeat 1 --tool={faial,gpuverify,pug} # runtime: ~40m
The --tool={faial,gpuverify,pug}
option specifies the tools to run. By default, only Faial is ran.
Lastly, to generate the graph as in the paper, run the following command:
$ python3 ../../benchmark/benchmark-graph.py -rw
The 3 pie charts for Faial, GPUVerify, and PUG are named faial-stats.pdf
, gpuverify-stats.pdf
, and pug-stats.pdf
respectively. The generated scatter graph is named time-relation-faial-scatter.pdf
.
The Docker container includes a HTTP server exposing $FAIAL_HOME
to port 8000
. This enables access to download logs, plots, and other files inside the container. To access experimental results, ensure the container is running, and navigate to: localhost:8000
Claim | Path | See |
---|---|---|
3.1 | (steps 3 and 4) | Table 1 |
3.2 | datasets/micro-benchmarks/Micro-benchmark-time-1-50.pdf | Fig 8 (lhs) |
3.2 | datasets/micro-benchmarks/Micro-benchmark-memory-1-50.pdf | Fig 8 (rhs) |
3.3 | datasets/gpuverify-cav14/faial-stats.pdf | Fig 9.a |
3.3 | datasets/gpuverify-cav14/gpuverify-stats.pdf | Fig 9.b |
3.3 | datasets/gpuverify-cav14/pug-stats.pdf | Fig 9.c |
3.3 | datasets/gpuverify-cav14/time-relation-faial-scatter.pdf | Fig 9.d |
Optional documentation of our kernel generation and benchmarking framework is provided in FRAMEWORK.md. Details include experiment configuration file parameters and the generation of tool-specific kernels from tool-agnostic templates.
This section covers reproducing the Docker container and building Faial from source.
To reproduce the Docker container, first install and start Docker.
Ensure you are in the root of the artifact. You should see the file Dockerfile
.
To build the image, run:
docker build --tag faial-cav21 . $
To save the image, run:
docker save faial-cav21 | bzip2 > artifact-354.tar.bz2 $
To reproduce this environment natively without Docker, follow along with the commands run by the provided Dockerfile. This is known to work on Ubuntu 20.04; other systems will require modifying package names and commands to those supported provided by your system.
The source for Faial is split across three repositories: faial, faial-infer, and c-to-json. Each repository is both available online and included with this artifact in directory source/
. Note that the source used for the version of Faial in this artifact is located in branches named cav21
.
See the Faial README for instructions on building from scratch.
We additionally provide prebuilt Linux binaries Faial:
As a next step, you may want to view our tutorial on using Faial to verify your own CUDA programs! This may be found locally at source/faial/tutorial/
or online in the Faial source repository.
Additionally, you can also manually run a single kernel from Claim 3's CAV14 dataset, by directly calling faial
on said kernel with the --parse-gv-args
option. For example:
$ cd $FAIAL_HOME/datasets/gpuverify-cav14/
$ faial --parse-gv-args faial/CUDA20/scan/best/kernel.cu
Program is data-race free!
The text editors vim
and nano
are included in the container so you may alter kernels and verify them. Please enjoy exploring verification with Faial.