Published April 9, 2019 | Version v1
Dataset Open

Server-side I/O request arrival traces

  • 1. Inria Grenoble / LIG, France
  • 2. Federal University of Rio Grande do Sul, Brazil

Description

Dataset generated for the "On server-side file access pattern matching" paper (Boito et al., HPCS 2019).

The traces were obtained following the methodology described in the paper. In addition to the two data sets discussed in the paper, we are also making available an extra data set of server traces.

Traces from I/O nodes

  • IOnode_traces/output/commands has the list of commands used to generate them. Each test is identified by a label, and the test_info.csv file contains the mapping of labels to access patterns. Some files include information about experiments with 8 I/O nodes, but these were removed from the data set because they had some errors.
  • IOnode_traces/output contains .map files that detail the mapping of clients to I/O nodes for each experiment, and .out files, which contain the output of the benchmark.
  • IOnode_traces/ contains one folder per experiment. Inside this folder, there is one folder per I/O node, and inside these folders there are tracefiles for the read and write portions of the experiments. Due to a mistake during the integration between IOFSL and AGIOS, read requests appear as "W", and writes as "R". Once accounted for when processing the traces, that has no impact on results.
  • pattern_length.csv contains the average pattern length for each experiment and operation (average number of requests per second), obtained with the get_pattern_length.py script.

Each line of a trace looks like this:

277004729325 00000000eaffffffffffff1f729db77200000000000000000000000000000000 W 0 262144

The first number is an internal timestamp in nanoseconds, the second value is the file handle, and the third is the type of the request (inverted, "W" for reads and "R" for writes). The last two numbers give the request offset and size in bytes, respectively.

Traces from parallel file sytem data servers

These traces are inside the server_traces/ folder. Each experiment has two concurrent applications, "app1" and "app2", and its traces are inside a folder named accordingly:

NOOP\_app1\_(identification of app1)\_app2\_(identification of app2)\_(repetition)\_pvfstrace/

Each application is identified by:

(contig/noncontig)\_(number and size of requests per process)\_(number of processes)\_(number of client machines)\_(nto1/nton regarding the number of files)

Inside each folder there are eight trace files, two per data server, one for the read portion and another for the write portion. Each line looks like this:

[D 02:54:58.386900] REQ SCHED SCHEDULING, handle: 5764607523034231596, queue_element: 0x2a11360, type: 0, offset: 458752, len: 32768

The part between [] is a timestamp, "handle" gives the file handle, "type" is 0 for reads and 1 for writes, "offset" and "len" (length) are in bytes.

  • server_traces/pattern_length.csv contains the average pattern length for each experiment and operation, obtained with the server_traces/count_pattern_length.py script.

Extra traces from data servers

These traces were not used for the paper because we do not have performance measurements for them with different scheduling policies, so it would not be possible to estimate the results of using the pattern matching approach to select scheduling policies. Still, we share them in the extra_server_traces/ folder in the hope they will be useful. They were obtained in the same experimental campaign than the other data server traces, and have the same format. The difference is that these traces are for single-application scenarios.

Notes

The source code used in the paper to handle these trace files is available in a git repository: https://gitlab.inria.fr/frzanonb/apmatching

Files

pattern_length.csv

Files (703.7 MB)

Name Size Download all
md5:c1ee68dad7d660973204117a8f1669bf
229.9 MB Download
md5:cb2493da792f566fc1293baa05e1f102
2.0 kB Download
md5:67c9180793d09ec5edee499e52966772
297.9 MB Download
md5:a40d4e9548d1956afaaf186ddae17b10
85.6 kB Preview Download
md5:d199e9bd6110d51c33811eea8e06147e
175.9 MB Download
md5:bb78b77261823e9f8a00d562d6c54e91
5.4 kB Preview Download

Additional details

Funding

DAMA – Extreme-Scale Data Management 800144
European Commission