Certifying Derivation of State Machines from Coroutines

doi:10.5281/zenodo.5553451

Published October 6, 2021 | Version 0.0.0

Software Open

Certifying Derivation of State Machines from Coroutines

Artifact associated with POPL22 submission "Certifying Derivation of State Machines from Coroutines".

It is currently under review following https://popl22.sigplan.org/track/POPL-2022-artifact-evaluation and should be read in combination with the submitted paper.

# Claims

0. Motivating claims about state of practice in network-protocol state-machine implementation are not supported by the artifact, with the exception that we include a copy of Mozilla NSS and ocaml-tls in the home directory of the virtual machine. These implementations are discussed in the introduction of our paper.
1. Certified compilation: Section 3.3 page 13 line 591 states "Our system automatically proves equivalence between a source program and its compiled version. Moreover, the proof is constructed as we compile a program". Section 3.4 presents a concrete example.
2. Bisimulation: Section 3.3 page 12 line 573 states "These rules are essentially a simplification of the classic technique of bisimulation for our setting. In fact, we proved equivalence with the more standard definition of bisimulation, after expressing our source and target semantics in labeled-transition-system style". The corresponding code is in `~/coroutines/src/ClConv.v` theorem `equiv_is_bisimulate`.
3. Single-client performance: Section 4.3 line 948 states "with one client thread [..] our derived implementation comes within 50% of the performance of either of the more established alternatives."
4. Performance in comparison to concurrent implementations: Section 4.3 line 960 states "Unsurprisingly, the other implementations with their multicore execution perform several times better than we do, though again it seems we are within the window where an especially paranoid user might prefer our proved server under moderate load." The corresponding Fig. 9. shows Warp handling 5x more requests per second than our server, and with 3.5x better latency.
5. Section 4 line 697 claims that our TLS library implements a "just a large enough subset of TLS that we can test with standard Web browsers".

Claims 1, 2, and 5 receive their own sections of this README, in that order. Instructions for performance benchmarking are combined in one section.

The Coq development is available at https://github.com/mit-plv/certifying-derivation-of-state-machines-from-coroutines under the MIT license.

# Setup instructions

We provide a 16GB disk image in .vdi format that boots into a Linux terminal environment accessible over SSH (or graphically, if you so wish).
You can get VirtualBox from your operating system's repositories or virtualbox.org.
Here are instructions for using VirtualBox from the command line:

```sh
cd . # navigate to the directory containing coroutines.vdi and coroutines.vbox
VBoxManage registervm "$(realpath coroutines.vbox)"
VBoxManage startvm coroutines
```

Wait until a login screen appears and then hide it. You should then be able to connect to the virtual machine using `ssh -p 10022 artifact@127.0.0.1`. Alternatively, you can use the graphical interface.

We provide the three commonly used Coq frontends:

- `vim ~/coroutines/src/ClConv.v` should load CoqTail automatically when a relevant keybinding is used; `\ c l` evaluates to the cursor.
- `emacs ~/coroutines/src/ClConv.v` should load ProofGeneral automatically when a `.v` file is opened; `Ctrl+Enter` evaluates to the cursor.
- `coqide ~/coroutines/src/ClConv.v` should be usable from the graphical interface, `Ctrl + RightArrow` evaluates to the cursor.

Please use your preferred Coq frontend to step through the first lemma in `~/coroutines/src/ClConv.v` to ensure that it is working properly. Further, please also follow the section "interoperability testing" below to ensure that the extracted Haskell code works as expected.

# Evaluation instructions

## Certified compilation

Attention: compiling TLS.v requires 3 hours of compute time and 37GB of RAM, which is above the limits of the submitted VirtualBox configuration. If you wish to verify it, you can increase the RAM limit under Virtualbox -> Machine -> Settings -> System -> Motherboard. You do not need to compile the file to evaluate the artifact; the build outputs are included.

`~/coroutines/src/ClConv.v` around line 1701 contains lemma `ex_coroutine2_derive`, which shows how our proof-producing compiler is used. The key line is `derive_coro tt` -- it invokes the compiler to prove the goal set up by the previous setup steps. Executing `Print Assumptions ex_coroutine2_derive` after the `Defined` should produce `Closed under global contex`. The meaning of the predicate `equiv` is confirmed in the next subsection.

## Bisimulation

1. `~/coroutines/src/ClConv.v` around line 85 should contain `Record Bisimulation`; the predicate defined there should be recognizable as the standard definition of bisimulation for labeled transition systems.
2. The previous inductive definitions in the same section package high-level code (coroutines encoded using free monads) and low-level code (dependently typed state machines) as labeled transition systems; these appear in the statement of the bisimulation theorem.
3. Finally, theorem `equiv_is_bisimulate` around line 266 states that our compiler's `equiv` is equivalent to the standard notion of bisimulation.
4. Please evaluate to the end of the section (the line after after `End Effect.`) and execute `Print Assumptions equiv_is_bisimulate`. The output should confirm that only standard axioms `functional_extensionality_dep` and `eq_rect_eq` were used.

## Performance Experiments

The scripts `~/coroutines/server/bench.sh`, `~/coroutines/warpserver/bench.sh`, and `~/coroutines/nginxserver/bench.sh` output performance results to files `/home/artifact/coroutines/warpserver/bench-warp-c1.txt`, `/home/artifact/coroutines/warpserver/bench-warp-c40.txt`, `/home/artifact/coroutines/nginxserver/bench-nginx-c40.txt`, `/home/artifact/coroutines/nginxserver/bench-nginx-c1.txt`, `/home/artifact/coroutines/server/bench-coroutines-c1.txt`, and `/home/artifact/coroutines/server/bench-coroutines-c40.txt`. Each script should take about a minute to run; they cannot be run concurrently because the servers listen on the same port.We encourage the artifact evaluators to confirm that these scripts launch the wrk benchmark tool against the corresponding server implementations found in `/home/artifact/coroutines/server/app/Main.hs`, `/home/artifact/coroutines/warpserver/main.hs`, and `/etc/nginx/nginx.conf`. Further, please check that the TLS server file matches the one that was extracted from coq: `diff -u ~/coroutines/TLS.hs ~/coroutines/server/src/TLS.hs.` (The latter has more imports and a slightly less eager type-error handler, but no code changes.)

### Observed Performance in the VM

We observe that nginx in the VM runs faster than in our presubmission testing, whereas warp and our implementation exhibit absolute performance similar to that reported in our paper. We also observed tens of percent of variance between the two VM hosts we tried. Here are the numbers from an ultraportable laptop with an Intel Broadwell i7 processor, in the order the bars are presented in Fig 9.

```
artifact@artifact:~$ find ~/coroutines -name '*-c40.txt' | xargs grep Latency | cut -d' ' -f-9
/home/artifact/coroutines/server/bench-coroutines-c1.txt:    Latency   347.78us
/home/artifact/coroutines/warpserver/bench-warp-c1.txt:    Latency   128.59us
/home/artifact/coroutines/nginxserver/bench-nginx-c1.txt:    Latency    73.54us
artifact@artifact:~$ find ~/coroutines -name '*-c40.txt' | xargs grep Latency | cut -d' ' -f-10

/home/artifact/coroutines/server/bench-coroutines-c40.txt:    Latency     7.89ms
/home/artifact/coroutines/warpserver/bench-warp-c40.txt:    Latency     2.23ms
/home/artifact/coroutines/nginxserver/bench-nginx-c40.txt:    Latency     1.13ms

artifact@artifact:~$ find ~/coroutines -name '*-c1.txt' | xargs grep Requests/ # SINGLE-THREADED THROUGHPUT -- THIS IS THE MOST IMPORTANT BENCHMARK
/home/artifact/coroutines/server/bench-coroutines-c1.txt:Requests/sec: 5410.86
/home/artifact/coroutines/warpserver/bench-warp-c1.txt:Requests/sec: 9895.07
/home/artifact/coroutines/nginxserver/bench-nginx-c1.txt:Requests/sec: 13347.46

artifact@artifact:~$ find ~/coroutines -name '*-c40.txt' | xargs grep Requests/
/home/artifact/coroutines/server/bench-coroutines-c40.txt:Requests/sec: 5156.72
/home/artifact/coroutines/warpserver/bench-warp-c40.txt:Requests/sec: 16339.43
/home/artifact/coroutines/nginxserver/bench-nginx-c40.txt:Requests/sec: 34450.72
```

## Interoperability testing

First, start our demo HTTPS server:

`( cd ~/coroutines/server && /usr/bin/time stack run server.crt server.pem )`

Then load it using headless Chrome:

`/opt/google/chrome/chrome --disable-gpu --headless --dump-dom https://localhost:4433`

The printed line should contain "Hello!" along with some HTML tags.

Alternatively, open the Chrome web browser from the graphical interface of the VM and navigate to https://localhost:4433/, which should display "Hello!".

Or using curl:

`curl --tlsv1.3 --cacert ~/coroutines/server/server.crt https://localhost:4433/`

The port 4433 is also forwarded to the VM host in the VirtualBox configuration so you can test it with your own browser. However, please note that the server uses a self-signed certificate for "localhost", so you'd likely need to click through several security warnings about that. Certificate-related warnings or errors are expected in such usage and should not be taken as a negative indication about the quality of the server implementation.

Here are some errors you might encounter when running the server.

- `server-exe: Network.Socket.bind: resource busy (Address already in use)` -- a server is already running on port 4433, try killing nginx, hs-exe or server-exe.
- `server-exe: thread blocked indefinitely in an MVar operation` -- this is an implementation bug we did not anticipate during development.
- "server-exe: threadWait: invalid argument (Bad file descriptor)" error and "server-exe: Network.Socket.recvBuf: resource vanished (Connection reset by peer)" -- we believe these refer to situations were the client connection was closed in the middle of a request.

# Artifact contents

The main Coq development is located in `~/coroutines/src/`, and the Haskell wrappers are in `~/coroutines/server`, including our copy of hs-tls with some internal APIs exposed at `~/coroutines/server/tls-1.5.3/`. The compiler is located in `~/coroutines/src/ClConv.v`, and the TLS case study is in `~/coroutines/src/TLS.v` culminating in `main_loop_derive`. Again, that file takes 3h and 37GB of RAM to process. Definition `doHandshake` is probably the most instructive to read to understand how TLS is implemented in our library; definition `readWrite` is the record-layer wrapper. `Parameter` directives in `TLS.v` are filled in with appropriate pure functions from `hs-tls` using `Extraction` directives at the end of the file. The main entry point to the compiler is `Ltac derive_coro`.

Coq is installed through OPAM, and Haskell is installed through Stack. Recent versions of curl and wrk are located in `~/.local/bin/`.

Files

Files (15.9 GB)

Name	Size	Download all
coroutines.vbox md5:1a0a3bd88c138f8b62ef3a3689eb0ad6	2.9 kB	Download
coroutines.vdi md5:46c75b1718c7095163f34032a43c4066	15.9 GB	Download

	All versions	This version
Views	487	191
Downloads	151	62
Data volume	842.6 GB	523.9 GB

Certifying Derivation of State Machines from Coroutines

Creators

Description

Files

Files (15.9 GB)