Software Open Access

Artifact for the OOPSLA'20 paper "Regex Matching with Counting-Set Automata"

Lukáš Holík; Ondřej Lengál; Olli Saarikivi; Lenka Turoňová; Margus Veanes; Tomáš Vojnar


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/3e44af1c-7611-4867-99ae-f2fa0ba30967/480.zip"
      }, 
      "checksum": "md5:0c3ea085bd409b83afab27a8adaff045", 
      "bucket": "3e44af1c-7611-4867-99ae-f2fa0ba30967", 
      "key": "480.zip", 
      "type": "zip", 
      "size": 292856502
    }
  ], 
  "owners": [
    53936
  ], 
  "doi": "10.5281/zenodo.3975566", 
  "stats": {
    "version_unique_downloads": 38.0, 
    "unique_views": 180.0, 
    "views": 210.0, 
    "version_views": 210.0, 
    "unique_downloads": 38.0, 
    "version_unique_views": 180.0, 
    "volume": 12007116582.0, 
    "version_downloads": 41.0, 
    "downloads": 41.0, 
    "version_volume": 12007116582.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.3975566", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.3975565", 
    "bucket": "https://zenodo.org/api/files/3e44af1c-7611-4867-99ae-f2fa0ba30967", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.3975565.svg", 
    "html": "https://zenodo.org/record/3975566", 
    "latest_html": "https://zenodo.org/record/3975566", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.3975566.svg", 
    "latest": "https://zenodo.org/api/records/3975566"
  }, 
  "conceptdoi": "10.5281/zenodo.3975565", 
  "created": "2020-08-07T10:46:15.665808+00:00", 
  "updated": "2020-08-07T12:59:22.896671+00:00", 
  "conceptrecid": "3975565", 
  "revision": 2, 
  "id": 3975566, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.3975566", 
    "description": "<p><strong>Artifact for the paper &quot;Regex Matching with Counting-Set Automata&quot; (OOPSLA&#39;20)</strong></p>\n\n<p>This is an artifact for the paper &quot;Regex Matching with Counting-Set Automata&quot; at OOPSLA&#39;20.</p>\n\n<p>This artifact is supposed to be run on the virtual machine Artifact Evaluation VM - Ubuntu 18.04 LTS available at&nbsp;<a href=\"http://doi.org/10.5281/zenodo.2759473\">http://doi.org/10.5281/zenodo.2759473</a>&nbsp;. The recommended virtualization software is VirtualBox (we used version 6.1.12).</p>\n\n<p>Please make sure to have at least 30 GiB allocated on your computer for the VM (the disc image will grow automatically). Let us warn you that running the (full) experiments on 1 CPU may take a time in the order of tens of hours and may cause your computer (in particular a laptop) to get hot (possibly overheat and turn off).</p>\n\n<p>Note: see the file&nbsp;<code>~/howto_vbox_shared_folder.txt</code>&nbsp;on how to set up a shared folder between the host and the guest OS (it is simple). It can make transferring of files from/to the VM easier.</p>\n\n<p><strong>Getting Started</strong></p>\n\n<p><strong>Preparing VM</strong></p>\n\n<ol>\n\t<li>Download the VM from&nbsp;<a href=\"http://doi.org/10.5281/zenodo.2759473\">http://doi.org/10.5281/zenodo.2759473</a>&nbsp;and import it into VirtualBox (we recommend at least 8 GiB of memory per CPU (4 GiB might also work, though some experiments may terminate sooner due to out-of-memory) --- if you allocate more CPUs, the benchmarks will run in parallel ; it is also a good idea not to do other demanding things on your host OS while the experiments are runnning, otherwise the OSes will be fighting for RAM).</li>\n\t<li>Start the VM, turn on Terminal (in the left bar), enable network connection, and download the artifact zip file.</li>\n</ol>\n\n<p>OR:</p>\n\n<ol>\n\t<li>Start the VM, turn on Terminal (in the left bar) and mount the shared folder according to&nbsp;<code>~/howto_vbox_shared_folder.txt</code>.</li>\n\t<li>Copy the artifact zip file from the shared folder to&nbsp;<code>$HOME</code>. Then run the following:</li>\n</ol>\n\n<pre><code>unzip &lt;artifact&gt;.zip\ncd &lt;artifact&gt;/\n</code></pre>\n\n<p><strong>Installing Packages</strong></p>\n\n<p>Go to the root directory of the artifact and run</p>\n\n<pre><code>sudo ./install_requirements.sh\n</code></pre>\n\n<p>(the&nbsp;<code>sudo</code>&nbsp;password is &quot;<code>ae</code>&quot;)</p>\n\n<p>Take a walk (~20 minutes).</p>\n\n<p>There might be some issues reported with installing some packages (some nasty stuff happens due to the need to update&nbsp;<code>libc</code>). The issues should not matter, since the installed tools can be used.</p>\n\n<p><strong>Preparing the Benchmarks</strong></p>\n\n<p>Download the dataset from&nbsp;<a href=\"https://doi.org/10.5281/zenodo.3974360\">https://doi.org/10.5281/zenodo.3974360</a>&nbsp;, unzip it and copy to the right location (you may need to enable network connection).</p>\n\n<pre><code>wget 'https://zenodo.org/record/3974360/files/benchmark-cnt-set-automata.zip?download=1' -O benchmark-cnt-set-automata.zip\nunzip benchmark-cnt-set-automata.zip\nmv benchmark-cnt-set-automata/bench/* run/\n</code></pre>\n\n<p><strong>Kicking the Tires</strong></p>\n\n<p>The following sequence of commands should check that everything is working and run a small subset of experiments, and generate a preliminary report.</p>\n\n<pre><code>cd run/\n./make_short.sh               (prepares short version of experiments)\n./run_short_benchmarks.sh\n...\n(take a walk ~20 mins)\n...\n./run_short_processing.sh\ncd ../results\n./generate-report.R\nfirefox results.html\n</code></pre>\n\n<p>You should see a web page with incomplete results of the experiments (consider increasing the resolution of the VM).</p>\n\n<p><strong>Step by Step Instructions</strong></p>\n\n<p><strong>Running the Full Experiments</strong></p>\n\n<pre><code>cd run/\n./run_benchmarks.sh\n</code></pre>\n\n<p>Take a long walk (possibly a trip Paris or any other place that you have always wanted to visit --- this may take a few tens of hours, based on your setup, so you may even manage to leave the quarantine before the experiments finish ;-) --- seriously, it might take two or three days ; you can, however, save the state of the VM and restore it later to continue with the experiments). You can change the timeout in&nbsp;<code>run/run_benchmarks.sh</code>&nbsp;to obtain partial results faster or remove some lines from&nbsp;<code>run/bench-*.txt</code>.</p>\n\n<p><strong>Processing the Results of Experiments</strong></p>\n\n<p>Before viewing the results, we recommend to change the resolution of the VM to a higher one.</p>\n\n<pre><code>(in run/)\n./run_processing.sh\n\ncd ../results/\n./generate-report.R\nfirefox results.html\n</code></pre>\n\n<p><strong>Supported Claims</strong></p>\n\n<p>The artifact reproduces the following parts of the paper:</p>\n\n<ol>\n\t<li>Fig. 5</li>\n\t<li>Table 1</li>\n</ol>\n\n<p>Since the machine running the artifact will most probably differ from the one we used to run the experiments, exact times, numbers of timeouts, etc. will most probably differ, but the trends should stay the same.</p>\n\n<p><strong>Extra Notes</strong></p>\n\n<p><strong>Installing Outside of the Provided VM</strong></p>\n\n<p>It should not be difficult to set up the environment on a Linux OS reasonably close to the one in the referenced VM. The needed Linux packages are</p>\n\n<pre><code>python3\nR\npandoc\nlibre2-dev\ngrep\nmono (version at least 5.*)\n</code></pre>\n\n<p>Python packages:</p>\n\n<pre><code>pyyaml\ntabulate\n</code></pre>\n\n<p>R packages:</p>\n\n<pre><code>rmarkdown\nknitr\nggplot2\nggExtra\ngridExtra\npastecs\n</code></pre>\n\n<p>You can follow the commands in the&nbsp;<a href=\"https://github.com/VeriFIT/csa-oopsla20-artifact/blob/master/install_requirements.sh\">installation script</a>&nbsp;to see what needs to be done.</p>\n\n<p><strong>Running Other Experiments</strong></p>\n\n<p>The experiments to run are stored in the&nbsp;<code>run/bench-*.txt</code>&nbsp;files, in a CSV-like format&nbsp;<code>pattern;input-file</code>&nbsp;where&nbsp;<code>pattern</code>&nbsp;can use escape characters as used in CSVs (compatible with Python&#39;s&nbsp;<code>csv</code>&nbsp;module). If you have a file&nbsp;<code>FILE</code>&nbsp;with your own benchmarks, you can run the following command in the&nbsp;<code>run/</code>&nbsp;directory:</p>\n\n<pre><code>cat FILE | ./pycobench -t TIMEOUT -o OUTPUT pattern_match.yaml\n</code></pre>\n\n<p>where&nbsp;<code>TIMEOUT</code>&nbsp;is the timeout (in seconds) and&nbsp;<code>OUTPUT</code>&nbsp;is a file that logs results of experiments. See&nbsp;<code>./pycobench -h</code>&nbsp;for more details.&nbsp;<code>./pycobench</code>&nbsp;by default runs every benchmark (i.e. a line in&nbsp;<code>FILE</code>) with all regex matchers as defined in&nbsp;<code>run/pattern_match.yaml</code>&nbsp;(the default definition runs them in the mode where they count the number of matching lines).</p>\n\n<p>When the command finishes, you need to process the output to collect the runtimes and numbers of matches to a format where there is single line for every benchmarks using the following commands:</p>\n\n<pre><code>cat OUTPUT | ./san_output.sh | ./proc_results.py &gt; results.csv\n</code></pre>\n\n<p>You can import the resulting CSV file in a spreadsheet editor. Note that there might be some problems with delimiters (such as &quot;;&quot; in the regexes), so you might first consider sanitizing the CSV to get rid of regexes by the&nbsp;<code>./sanitize-csv.py</code>&nbsp;script.</p>", 
    "language": "eng", 
    "title": "Artifact for the OOPSLA'20 paper \"Regex Matching with Counting-Set Automata\"", 
    "license": {
      "id": "MIT"
    }, 
    "relations": {
      "version": [
        {
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "3975565"
          }, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "3975566"
          }
        }
      ]
    }, 
    "keywords": [
      "regular expression", 
      "repetition operator", 
      "finite automaton", 
      "symbolic automaton", 
      "counting-set automaton"
    ], 
    "publication_date": "2020-08-07", 
    "creators": [
      {
        "orcid": "0000-0001-6957-1651", 
        "affiliation": "Brno University of Technology", 
        "name": "Luk\u00e1\u0161 Hol\u00edk"
      }, 
      {
        "orcid": "0000-0002-3038-5875", 
        "affiliation": "Brno University of Technology", 
        "name": "Ond\u0159ej Leng\u00e1l"
      }, 
      {
        "orcid": "0000-0001-7596-4734", 
        "affiliation": "Microsoft Research", 
        "name": "Olli Saarikivi"
      }, 
      {
        "affiliation": "Brno University of Technology", 
        "name": "Lenka Turo\u0148ov\u00e1"
      }, 
      {
        "affiliation": "Microsoft Research", 
        "name": "Margus Veanes"
      }, 
      {
        "orcid": "0000-0002-2746-8792", 
        "affiliation": "Brno University of Technology", 
        "name": "Tom\u00e1\u0161 Vojnar"
      }
    ], 
    "meeting": {
      "acronym": "OOPSLA'20", 
      "url": "https://2020.splashcon.org/track/splash-2020-oopsla", 
      "dates": "15-20 November 2020", 
      "place": "Chicago", 
      "title": "Object-Oriented Programming, Systems, Languages, and Applications 2020"
    }, 
    "access_right": "open", 
    "resource_type": {
      "type": "software", 
      "title": "Software"
    }, 
    "related_identifiers": [
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.3975565", 
        "relation": "isVersionOf"
      }
    ]
  }
}
210
41
views
downloads
All versions This version
Views 210210
Downloads 4141
Data volume 12.0 GB12.0 GB
Unique views 180180
Unique downloads 3838

Share

Cite as