Software Open Access
Artifact for the paper "Regex Matching with Counting-Set Automata" (OOPSLA'20)
This is an artifact for the paper "Regex Matching with Counting-Set Automata" at OOPSLA'20.
This artifact is supposed to be run on the virtual machine Artifact Evaluation VM - Ubuntu 18.04 LTS available at http://doi.org/10.5281/zenodo.2759473 . The recommended virtualization software is VirtualBox (we used version 6.1.12).
Please make sure to have at least 30 GiB allocated on your computer for the VM (the disc image will grow automatically). Let us warn you that running the (full) experiments on 1 CPU may take a time in the order of tens of hours and may cause your computer (in particular a laptop) to get hot (possibly overheat and turn off).
Note: see the file
~/howto_vbox_shared_folder.txt on how to set up a shared folder between the host and the guest OS (it is simple). It can make transferring of files from/to the VM easier.
$HOME. Then run the following:
unzip <artifact>.zip cd <artifact>/
Go to the root directory of the artifact and run
sudo password is "
Take a walk (~20 minutes).
There might be some issues reported with installing some packages (some nasty stuff happens due to the need to update
libc). The issues should not matter, since the installed tools can be used.
Preparing the Benchmarks
Download the dataset from https://doi.org/10.5281/zenodo.3974360 , unzip it and copy to the right location (you may need to enable network connection).
wget 'https://zenodo.org/record/3974360/files/benchmark-cnt-set-automata.zip?download=1' -O benchmark-cnt-set-automata.zip unzip benchmark-cnt-set-automata.zip mv benchmark-cnt-set-automata/bench/* run/
Kicking the Tires
The following sequence of commands should check that everything is working and run a small subset of experiments, and generate a preliminary report.
cd run/ ./make_short.sh (prepares short version of experiments) ./run_short_benchmarks.sh ... (take a walk ~20 mins) ... ./run_short_processing.sh cd ../results ./generate-report.R firefox results.html
You should see a web page with incomplete results of the experiments (consider increasing the resolution of the VM).
Step by Step Instructions
Running the Full Experiments
cd run/ ./run_benchmarks.sh
Take a long walk (possibly a trip Paris or any other place that you have always wanted to visit --- this may take a few tens of hours, based on your setup, so you may even manage to leave the quarantine before the experiments finish ;-) --- seriously, it might take two or three days ; you can, however, save the state of the VM and restore it later to continue with the experiments). You can change the timeout in
run/run_benchmarks.sh to obtain partial results faster or remove some lines from
Processing the Results of Experiments
Before viewing the results, we recommend to change the resolution of the VM to a higher one.
(in run/) ./run_processing.sh cd ../results/ ./generate-report.R firefox results.html
The artifact reproduces the following parts of the paper:
Since the machine running the artifact will most probably differ from the one we used to run the experiments, exact times, numbers of timeouts, etc. will most probably differ, but the trends should stay the same.
Installing Outside of the Provided VM
It should not be difficult to set up the environment on a Linux OS reasonably close to the one in the referenced VM. The needed Linux packages are
python3 R pandoc libre2-dev grep mono (version at least 5.*)
rmarkdown knitr ggplot2 ggExtra gridExtra pastecs
You can follow the commands in the installation script to see what needs to be done.
Running Other Experiments
The experiments to run are stored in the
run/bench-*.txt files, in a CSV-like format
pattern can use escape characters as used in CSVs (compatible with Python's
csv module). If you have a file
FILE with your own benchmarks, you can run the following command in the
cat FILE | ./pycobench -t TIMEOUT -o OUTPUT pattern_match.yaml
TIMEOUT is the timeout (in seconds) and
OUTPUT is a file that logs results of experiments. See
./pycobench -h for more details.
./pycobench by default runs every benchmark (i.e. a line in
FILE) with all regex matchers as defined in
run/pattern_match.yaml (the default definition runs them in the mode where they count the number of matching lines).
When the command finishes, you need to process the output to collect the runtimes and numbers of matches to a format where there is single line for every benchmarks using the following commands:
cat OUTPUT | ./san_output.sh | ./proc_results.py > results.csv
You can import the resulting CSV file in a spreadsheet editor. Note that there might be some problems with delimiters (such as ";" in the regexes), so you might first consider sanitizing the CSV to get rid of regexes by the