Artifact for the OOPSLA'20 paper "Regex Matching with Counting-Set Automata"
Creators
- 1. Brno University of Technology
- 2. Microsoft Research
Description
Artifact for the paper "Regex Matching with Counting-Set Automata" (OOPSLA'20)
This is an artifact for the paper "Regex Matching with Counting-Set Automata" at OOPSLA'20.
This artifact is supposed to be run on the virtual machine Artifact Evaluation VM - Ubuntu 18.04 LTS available at http://doi.org/10.5281/zenodo.2759473 . The recommended virtualization software is VirtualBox (we used version 6.1.12).
Please make sure to have at least 30 GiB allocated on your computer for the VM (the disc image will grow automatically). Let us warn you that running the (full) experiments on 1 CPU may take a time in the order of tens of hours and may cause your computer (in particular a laptop) to get hot (possibly overheat and turn off).
Note: see the file ~/howto_vbox_shared_folder.txt
on how to set up a shared folder between the host and the guest OS (it is simple). It can make transferring of files from/to the VM easier.
Getting Started
Preparing VM
- Download the VM from http://doi.org/10.5281/zenodo.2759473 and import it into VirtualBox (we recommend at least 8 GiB of memory per CPU (4 GiB might also work, though some experiments may terminate sooner due to out-of-memory) --- if you allocate more CPUs, the benchmarks will run in parallel ; it is also a good idea not to do other demanding things on your host OS while the experiments are runnning, otherwise the OSes will be fighting for RAM).
- Start the VM, turn on Terminal (in the left bar), enable network connection, and download the artifact zip file.
OR:
- Start the VM, turn on Terminal (in the left bar) and mount the shared folder according to
~/howto_vbox_shared_folder.txt
. - Copy the artifact zip file from the shared folder to
$HOME
. Then run the following:
unzip <artifact>.zip
cd <artifact>/
Installing Packages
Go to the root directory of the artifact and run
sudo ./install_requirements.sh
(the sudo
password is "ae
")
Take a walk (~20 minutes).
There might be some issues reported with installing some packages (some nasty stuff happens due to the need to update libc
). The issues should not matter, since the installed tools can be used.
Preparing the Benchmarks
Download the dataset from https://doi.org/10.5281/zenodo.3974360 , unzip it and copy to the right location (you may need to enable network connection).
wget 'https://zenodo.org/record/3974360/files/benchmark-cnt-set-automata.zip?download=1' -O benchmark-cnt-set-automata.zip
unzip benchmark-cnt-set-automata.zip
mv benchmark-cnt-set-automata/bench/* run/
Kicking the Tires
The following sequence of commands should check that everything is working and run a small subset of experiments, and generate a preliminary report.
cd run/
./make_short.sh (prepares short version of experiments)
./run_short_benchmarks.sh
...
(take a walk ~20 mins)
...
./run_short_processing.sh
cd ../results
./generate-report.R
firefox results.html
You should see a web page with incomplete results of the experiments (consider increasing the resolution of the VM).
Step by Step Instructions
Running the Full Experiments
cd run/
./run_benchmarks.sh
Take a long walk (possibly a trip Paris or any other place that you have always wanted to visit --- this may take a few tens of hours, based on your setup, so you may even manage to leave the quarantine before the experiments finish ;-) --- seriously, it might take two or three days ; you can, however, save the state of the VM and restore it later to continue with the experiments). You can change the timeout in run/run_benchmarks.sh
to obtain partial results faster or remove some lines from run/bench-*.txt
.
Processing the Results of Experiments
Before viewing the results, we recommend to change the resolution of the VM to a higher one.
(in run/)
./run_processing.sh
cd ../results/
./generate-report.R
firefox results.html
Supported Claims
The artifact reproduces the following parts of the paper:
- Fig. 5
- Table 1
Since the machine running the artifact will most probably differ from the one we used to run the experiments, exact times, numbers of timeouts, etc. will most probably differ, but the trends should stay the same.
Extra Notes
Installing Outside of the Provided VM
It should not be difficult to set up the environment on a Linux OS reasonably close to the one in the referenced VM. The needed Linux packages are
python3
R
pandoc
libre2-dev
grep
mono (version at least 5.*)
Python packages:
pyyaml
tabulate
R packages:
rmarkdown
knitr
ggplot2
ggExtra
gridExtra
pastecs
You can follow the commands in the installation script to see what needs to be done.
Running Other Experiments
The experiments to run are stored in the run/bench-*.txt
files, in a CSV-like format pattern;input-file
where pattern
can use escape characters as used in CSVs (compatible with Python's csv
module). If you have a file FILE
with your own benchmarks, you can run the following command in the run/
directory:
cat FILE | ./pycobench -t TIMEOUT -o OUTPUT pattern_match.yaml
where TIMEOUT
is the timeout (in seconds) and OUTPUT
is a file that logs results of experiments. See ./pycobench -h
for more details. ./pycobench
by default runs every benchmark (i.e. a line in FILE
) with all regex matchers as defined in run/pattern_match.yaml
(the default definition runs them in the mode where they count the number of matching lines).
When the command finishes, you need to process the output to collect the runtimes and numbers of matches to a format where there is single line for every benchmarks using the following commands:
cat OUTPUT | ./san_output.sh | ./proc_results.py > results.csv
You can import the resulting CSV file in a spreadsheet editor. Note that there might be some problems with delimiters (such as ";" in the regexes), so you might first consider sanitizing the CSV to get rid of regexes by the ./sanitize-csv.py
script.
Files
480.zip
Files
(292.9 MB)
Name | Size | Download all |
---|---|---|
md5:0c3ea085bd409b83afab27a8adaff045
|
292.9 MB | Preview Download |