Server-side I/O request arrival traces
Dataset generated for the "On server-side file access pattern matching" paper (Boito et al., HPCS 2019).
The traces were obtained following the methodology described in the paper. In addition to the two data sets discussed in the paper, we are also making available an extra data set of server traces.
Traces from I/O nodes
- IOnode_traces/output/commands has the list of commands used to generate them. Each test is identified by a label, and the test_info.csv file contains the mapping of labels to access patterns. Some files include information about experiments with 8 I/O nodes, but these were removed from the data set because they had some errors.
- IOnode_traces/output contains .map files that detail the mapping of clients to I/O nodes for each experiment, and .out files, which contain the output of the benchmark.
- IOnode_traces/ contains one folder per experiment. Inside this folder, there is one folder per I/O node, and inside these folders there are tracefiles for the read and write portions of the experiments. Due to a mistake during the integration between IOFSL and AGIOS, read requests appear as "W", and writes as "R". Once accounted for when processing the traces, that has no impact on results.
- pattern_length.csv contains the average pattern length for each experiment and operation (average number of requests per second), obtained with the get_pattern_length.py script.
Each line of a trace looks like this:
277004729325 00000000eaffffffffffff1f729db77200000000000000000000000000000000 W 0 262144
The first number is an internal timestamp in nanoseconds, the second value is the file handle, and the third is the type of the request (inverted, "W" for reads and "R" for writes). The last two numbers give the request offset and size in bytes, respectively.
Traces from parallel file sytem data servers
These traces are inside the server_traces/ folder. Each experiment has two concurrent applications, "app1" and "app2", and its traces are inside a folder named accordingly:
NOOP\_app1\_(identification of app1)\_app2\_(identification of app2)\_(repetition)\_pvfstrace/
Each application is identified by:
(contig/noncontig)\_(number and size of requests per process)\_(number of processes)\_(number of client machines)\_(nto1/nton regarding the number of files)
Inside each folder there are eight trace files, two per data server, one for the read portion and another for the write portion. Each line looks like this:
[D 02:54:58.386900] REQ SCHED SCHEDULING, handle: 5764607523034231596, queue_element: 0x2a11360, type: 0, offset: 458752, len: 32768
The part between  is a timestamp, "handle" gives the file handle, "type" is 0 for reads and 1 for writes, "offset" and "len" (length) are in bytes.
- server_traces/pattern_length.csv contains the average pattern length for each experiment and operation, obtained with the server_traces/count_pattern_length.py script.
Extra traces from data servers
These traces were not used for the paper because we do not have performance measurements for them with different scheduling policies, so it would not be possible to estimate the results of using the pattern matching approach to select scheduling policies. Still, we share them in the extra_server_traces/ folder in the hope they will be useful. They were obtained in the same experimental campaign than the other data server traces, and have the same format. The difference is that these traces are for single-application scenarios.
||85.6 kB||Preview Download|
||5.4 kB||Preview Download|