Checking Data-Race Freedom of GPU Kernels, Compositionally

CAV’21 Artifact

by Tiago Cogumbreiro, Julien Lange, Dennis Liew Zhen Rong, and Hannah Zicarelli

This paper introduces Faial, a tool that guarantees data-race freedom (DRF) for CUDA kernels.

The artifact submission contains:

The structure of this document:

  1. Setting up the Container
  2. Proofs
  3. Reproducing Experimental Results
    1. Claim 1: Correctness
    2. Claim 2: Scalability
    3. Claim 3: Real-world usability
  4. Accessing Experimental Results
  5. Kernel Generation Framework
  6. Rebuilding this Artifact
    1. Building with Docker
    2. Building Faial
  7. Tutorial on Faial Usage

The structure of this container (also available on GitLab):

1. Setting up the Container

First, install and start Docker.

The Docker image is available as both as a compressed tar archive and also online. Choose one of the two following methods. In both cases, the container will load an interactive terminal session at directory /artifact; the environment variable FAIAL_HOME pointsto this location.

Loading Docker from a tar archive

  1. Ensure you are in the root of this artifact. You should see the compressed tar archive artifact-354.tar.bz2.

  2. To load the Docker container from this archive, run:

    $ docker load < artifact-354.tar.bz2
  3. To enter the container with an interactive terminal session, run:

    $ docker run -it -p 8000:8000 faial-cav21

Loading Docker from online

  1. To download the image from the web, run:

    $ docker pull registry.gitlab.com/umb-svl/faial-artifact-cav21/artifact:latest
  2. To enter the container with an interactive terminal session, run:

    $ docker run -it -p 8000:8000 registry.gitlab.com/umb-svl/faial-artifact-cav21/artifact:latest

2. Proofs

Mechanized proofs supporting theoretical results are available locally at faial-coq/ and online at GitLab.

To check proofs run make:

$ cd $FAIAL_HOME/faial-coq
$ make clean # Clean any already compiled proofs (optional)
$ make       # Check if all proofs are compiled

File _CoqProject lists all files that will be compiled, thus their proofs will be checked. Below we list the file, line number, and the name of the definition/theorem, e.g., Main.v:619 theorem drf corresponds to file faial-coq/src/Main.v, line number 619, and theorem drf. For your convenience, we also provide a hyperlink to the file in our GitLab repository (branch cav21).

Results

Figure 2

Figure 3

Figure 4

Noteworthy differences between the paper and the Coq mechanization

OCaml implementation

3. Reproducing Experimental Results

This section contains instructions on generating the data used in the paper.

The CSV data, logs, and plots used in the paper are already included in each of the respective directories. Rerunning the experiment will overwrite the results.

To visualise the generated data, the Docker container includes a HTTP server exposing $FAIAL_HOME to port 8000. To access the data, ensure the container is running, and open the following URL in your favourite browser (on the host machine): localhost:8000.

See Accessing Experimental Results for more details.

Warning: Rerunning the experiment will overwrite the bundled logs/figures that support the paper with your own logs/figures! Reverting to the original logs/figures is possible via a backup copy of:

3.1 Claim 1: Correctness

Expected runtime of this experiment: ~20 minutes.

This section details our experimental dataset, results, and procedure related to Table 1 in Claim 1: Correctness. This experiment requires manual processing! While we provide scripts to generate data, verifying the correctness of data requires manual examination.

  1. All files relating to Claim 1: Correctness are stored in the datasets/correctness directory.

    $ cd $FAIAL_HOME/datasets/correctness
    
  2. The dataset for Table 1 is split into Tests 1-5. Test 1 (one test per tool) can be found in directory {TOOL}/real-world/transposeDiagonal.cu, eg faial/real-world/transposeDiagonal.cu. Tests 2-5 can be found in directory {TOOL}/synthetic/{TEST}.cu (one test per tool), example gklee/synthetic/last-iter.cu. Each test has a DRF version and a racy version, which are distinguishable by the filename. For instance, {TOOL}/synthetic/last-iter-drf.cu is DRF and {TOOL}/synthetic/last-iter.cu is racy.

    Automatic scripts are provided to to rerun the tools against the dataset:

    $ python3 run.py --tool faial      # runtime: ~5s
    $ python3 run.py --tool gpuverify  # runtime: ~50s
    $ python3 run.py --tool pug        # runtime: ~3s
    $ python3 run.py --tool gklee      # runtime: ~7m  /!\ WARNING THIS MAY CRASH DUE TO GKLEE
    $ python3 run.py --tool sesa       # runtime: ~12m /!\ WARNING THIS MAY CRASH DUE TO SESA
    

    The above commands will generate logs and a timings-{TOOL}.csv for each tool. This data contains tool exit statuses, time and memory, paths to tool-specific kernels, paths to tool logs, and DRF or racy results from parsing logs.

  3. A script is provided to generate a table with the data generated above:

    $ python3 table.py
    

    This table shows the results from the timing CSVs in a prettier format. The following is the output of the table script we observed in our experiment.

    example                   expected    faial    gpuverify    pug    gklee           sesa
    ------------------------  ----------  -------  -----------  -----  --------------  --------------
    transposeDiagonal         racy        racy     racy         drf    timeout         timeout
    transposeDiagonal-drf     drf         drf      racy         drf    timeout         timeout
    first-iter                racy        racy     racy         racy   timeout         timeout
    first-iter-drf            drf         drf      racy         racy   timeout         timeout
    last-iter                 racy        racy     racy         racy   timeout         timeout
    last-iter-drf             drf         drf      racy         drf    timeout         timeout
    last-iter-first-iter      racy        racy     racy         racy   timeout         timeout
    last-iter-first-iter-drf  drf         drf      racy         racy   timeout         timeout
    read-index-racy           racy        racy     racy         racy   no race alarms  no race alarms
    read-index                drf         racy     drf          racy   no race alarms  no race alarms
    

    Note that while this table displays some information wrt. racy-ness, the validity of these results needs to be validated manually, as we explain below.

  4. To count and verify the correctness of data-races, logs must be manually examined for each racy result. The objective of this manual analysis is to count the number of data-races reported and determine if the error traces raised by tools accurately reflect real data-races. All information provided related to the race is considered, e.g., state of local and global variables, types of accesses (read/write), source code line numbers of accesses.

    For DRF test components, it is only necessary to count the reported races as they can be assumed invalid. For racy test components, it is additionally necessary to verify the correctness of each data-race.

    We include a file with this analysis for each racy tool log in our results. Each analysis file is a .txt file corresponding to the .log file with tool output. For example, a data-race is reported in for Faial in faial/synthetic/read-index-racy-1.log, and we provide an analysis of this race in faial/synthetic/read-index-racy-1.txt.

    To verify data-races in tool logs, a working understanding of the data-races in question is helpful. The paper provides context for these races through respective access memory protocols:

3.2 Claim 2: Scalability

Expected runtime of this experiment: ~1 hour (with --repeat 1) or ~5 hours (with --repeat 5).

This section details our experimental dataset, results, and procedure related to Figure 8 in Claim 2: Scalability.

  1. All files relating to Claim 2: Scalability are stored in the micro-benchmarks directory.

    $ cd $FAIAL_HOME/datasets/micro-benchmarks
    
  2. Tool-specific versions of the synthetic dataset used for this experiment are stored in directories respective to their tool names. To run the tools against the dataset:

    $ python3 run.py --repeat 5 --tool faial      # runtime: ~27m
    $ python3 run.py --repeat 5 --tool pug        # runtime: ~17m
    $ python3 run.py --repeat 5 --tool sesa       # runtime: ~7m  /!\ WARNING THIS MAY CRASH DUE TO SESA
    $ python3 run.py --repeat 5 --tool gklee      # runtime: ~7m  /!\ WARNING THIS MAY CRASH DUE TO GKLEE
    $ python3 run.py --repeat 5 --tool gpuverify  # runtime: ~4hr
    

    The above commands were used to produce the results in the paper. We ran all tools 5 times on all problems. The above commands will generate a timings-{TOOL}.csv for each tool. This data contains tool exit statuses, time and memory, paths to tool-specific kernels, paths to tool logs, and DRF or racy results from parsing logs.

  3. To generate the graph as in the paper, run the following command:

    $ python3 ../../benchmark/benchmark-graph.py -mb
    

    The generated graphs are named Micro-benchmark-time-1-50.pdf and Micro-benchmark-memory-1-50.pdf.

3.3 Claim 3: Real-world usability

Expected runtime of this experiment: ~40 minutes (with --repeat 1) or ~3.5 hours (with --repeat 5).

This section details our experimental dataset, results, and procedure related to Figure 9 in Claim 3: Real-world usability.

  1. All files relating to Claim 3: Real-world usability, are stored in the gpuverify-cav14 directory.

    $ cd $FAIAL_HOME/datasets/gpuverify-cav14
    
  2. Tool-specific versions of the synthetic dataset used for the experiment are found in directories respective to their tool names. To run each tool against the dataset:

    $ python3 run.py --repeat 5 --tool faial      # runtime: ~9m
    $ python3 run.py --repeat 5 --tool pug        # runtime: ~3m
    $ python3 run.py --repeat 5 --tool gpuverify  # runtime: ~3hr
    

    The commands above were used to produce the results in the paper. We ran all tools 5 times on all kernels. The above commands will generate a timings-{TOOL}.csv for each tool. This data contains tool exit statuses, time and memory, paths to tool-specific kernels, paths to tool logs, and DRF or racy results from parsing logs.

  3. Lastly, to generate the graph as in the paper, run the following command:

    $ python3 ../../benchmark/benchmark-graph.py -rw
    

    The 3 pie charts for Faial, GPUVerify, and PUG are named faial-stats.pdf, gpuverify-stats.pdf, and pug-stats.pdf respectively. The generated scatter graph is named time-relation-faial-scatter.pdf.

4 Accessing Experimental Results

The Docker container includes a HTTP server exposing $FAIAL_HOME to port 8000. This enables access to download logs, plots, and other files inside the container. To access experimental results, ensure the container is running, and navigate to: localhost:8000

Claim Path See
3.1 (steps 3 and 4) Table 1
3.2 datasets/micro-benchmarks/Micro-benchmark-time-1-50.pdf Fig 8 (lhs)
3.2 datasets/micro-benchmarks/Micro-benchmark-memory-1-50.pdf Fig 8 (rhs)
3.3 datasets/gpuverify-cav14/faial-stats.pdf Fig 9.a
3.3 datasets/gpuverify-cav14/gpuverify-stats.pdf Fig 9.b
3.3 datasets/gpuverify-cav14/pug-stats.pdf Fig 9.c
3.3 datasets/gpuverify-cav14/time-relation-faial-scatter.pdf Fig 9.d

5 Kernel Generation Framework

Optional documentation of our kernel generation and benchmarking framework is provided in FRAMEWORK.md. Details include experiment configuration file parameters and the generation of tool-specific kernels from tool-agnostic templates.

6. Rebuilding this Artifact

This section covers reproducing the Docker container and building Faial from source.

6.1 Building with Docker

To reproduce the Docker container, first install and start Docker.

Building the tar archive

  1. Ensure you are in the root of the artifact. You should see the file Dockerfile.

  2. To build the image, run:

    $ docker build --tag faial-cav21 . 
  3. To save the image, run:

    $ docker save faial-cav21 | bzip2 > artifact-354.tar.bz2

Building without Docker

To reproduce this environment natively without Docker, follow along with the commands run by the provided Dockerfile. This is known to work on Ubuntu 20.04; other systems will require modifying package names and commands to those supported provided by your system.

6.2 Building Faial

The source for Faial is split across three repositories: faial, faial-infer, and c-to-json. Each repository is both available online and included with this artifact in directory source/. Note that the source used for the version of Faial in this artifact is located in branches named cav21.

See the Faial README for instructions on building from scratch.

Prebuilt Linux Binaries

We additionally provide prebuilt Linux binaries Faial:

7. Tutorial on Faial usage

As a next step, you may want to view our tutorial on using Faial to verify your own CUDA programs! This may be found locally at source/faial/tutorial/ or online in the Faial source repository.

Additionally, you can also manually run a single kernel from Claim 3's CAV14 dataset, by directly calling faial on said kernel with the --parse-gv-args option. For example:

$ cd $FAIAL_HOME/datasets/gpuverify-cav14/
$ faial --parse-gv-args faial/CUDA20/scan/best/kernel.cu
  Program is data-race free!

The text editors vim and nano are included in the container so you may alter kernels and verify them. Please enjoy exploring verification with Faial.