Timely Reporting of Heavy Hitters using External memory

Pandey, Prashant

doi:10.5281/zenodo.4380373

Published December 20, 2020 | Version v1

Software Open

Timely Reporting of Heavy Hitters using External memory

Pandey, Prashant¹

1. Lawrence Berkeley Lab/UC Berkeley

LERTs reproducibility experiment list

Paper: Timely Reporting of Heavy Hitters using External memory (https://dl.acm.org/doi/pdf/10.1145/3318464.3380598)

To download the compressed zip file containing source code, latex code for paper building, and necessary scripts.

There are two folders in the submission:

LERT-src: contains source codes for all the data structures and a script containing the commands to run benchmarks. The benchmarks dump the results in text files in dir "sigmod20_raw". This directory is initially empty and gets populated after running benchmarks.
SIGMOD20-paper: contains tex files to generate the paper.pdf. The tex file uses the data generated by the benchmarks in the “sigmod20_raw” directory to create the graphs in the paper. There is also a “data” directory that contains all the results from the published version of the paper. In case someone wants to skip running some or all of the experiments, the data files for the experiments can be copied to the “sigmod20_raw” directory to build the paper with original results.

Note: “sigmod20_raw_tested” and “sigmod20_figs_tested” contains the output during script testing runs. These folders can be renamed to “sigmod20_raw” and “sigmod20_figs” to build the paper.

Requirements:

OpenSSL version 1.0.2g 1 Mar 2016 or higher.
CPU: Intel 4th generation Haswell or higher.
Flash drive (SSD) > 64GB
RAM > 64GB
Number of cores 64
cgroups
1. We have included an installation script (limitMem.sh) to install and setup a cgroup profile. The cgroup profile is used for on-ssd experiments.

Instructions to build the paper:

./limitMem.sh : To setup cgroups profile.
cd LERT-src
./run_experiments.sh
cd SIGMOD20-paper
make : To build the paper using the results generated from benchmarks.

The paper contains graphs for the following benchmarks. The bash script contains functions to perform these experiments and plot the output. These functions can also be executed one-by-one in the order.

Generate Data
1. This function generates input datasets for the experiments in the paper
2. It uses Firehose code base to generate a simulated cyber stream
3. The expected files at the successful completion of the function are listed in the bash script
4. It requires RAM ~64GB and 4+ Hrs
Validate count and time Stretch
1. It corresponds to Figure 1 in the paper
2. This requires RAM ~ 64GB and 3+ Hrs
Validate time stretch for arbitrary datasets
1. It corresponds to Figure 2a in the paper
2. This requires RAM ~ 64GB and 2+ Hrs
Validate count stretch for buffer strategy
1. It corresponds to Figure 2b in the paper
2. This requires RAM ~ 64GB and 2+ Hrs
Validate time and count stretch with lifetime
1. It corresponds to Figure 3 in the paper
2. This requires RAM ~ 64GB and 2+ Hrs
Compute I/O and throughput
1. It corresponds to Figure 4 and Figure 5a in the paper
2. This requires RAM ~ 2GB and < 1 Hr
Scalability experiment
1. It corresponds to Figure 5b in the paper
2. This requires RAM ~ 2GB and more than a day
Instantaneous throughput
1. It corresponds to Figure 5c in the paper
2. This requires RAM ~ 2GB and < 1 Hr

Note: Figure 4 compares the theoretical and empirical I/O of the different LERTs. The theoretical read/write is computed based on the number of shuffle-merges while inserting observations from the stream. The empirical read/write is measured using iotop. We only plot the empirical I/O in the output plot.

Note: Figure 5c was created using the instantaneous throughput numbers in Gnumeric. The Gnumeric is included in the repo to compare with instantaneous throughput numbers generated during reproducibility. The two output files generated after the experiments are “Instantaneous_throughput_1C_1T.txt” and “Instantaneous_throughput_1024C_4T.txt”. We do not have a way to automate this process.

Files

Files (211.7 MB)

Name	Size	Download all
sigmod20_lert_prashantpandey.tar.gz md5:dd4eb9b8cc5d539d421d66737d43058a	211.7 MB	Download

	All versions	This version
Views	251	248
Downloads	36	35
Data volume	7.8 GB	7.6 GB

Timely Reporting of Heavy Hitters using External memory

Creators

Description

Files

Files (211.7 MB)