Dataset Open Access
Dataset generated for the "On server-side file access pattern matching" paper (Boito et al., HPCS 2019).
The traces were obtained following the methodology described in the paper. In addition to the two data sets discussed in the paper, we are also making available an extra data set of server traces.
Traces from I/O nodes
Each line of a trace looks like this:
277004729325 00000000eaffffffffffff1f729db77200000000000000000000000000000000 W 0 262144
The first number is an internal timestamp in nanoseconds, the second value is the file handle, and the third is the type of the request (inverted, "W" for reads and "R" for writes). The last two numbers give the request offset and size in bytes, respectively.
Traces from parallel file sytem data servers
These traces are inside the server_traces/ folder. Each experiment has two concurrent applications, "app1" and "app2", and its traces are inside a folder named accordingly:
NOOP\_app1\_(identification of app1)\_app2\_(identification of app2)\_(repetition)\_pvfstrace/
Each application is identified by:
(contig/noncontig)\_(number and size of requests per process)\_(number of processes)\_(number of client machines)\_(nto1/nton regarding the number of files)
Inside each folder there are eight trace files, two per data server, one for the read portion and another for the write portion. Each line looks like this:
[D 02:54:58.386900] REQ SCHED SCHEDULING, handle: 5764607523034231596, queue_element: 0x2a11360, type: 0, offset: 458752, len: 32768
The part between  is a timestamp, "handle" gives the file handle, "type" is 0 for reads and 1 for writes, "offset" and "len" (length) are in bytes.
Extra traces from data servers
These traces were not used for the paper because we do not have performance measurements for them with different scheduling policies, so it would not be possible to estimate the results of using the pattern matching approach to select scheduling policies. Still, we share them in the extra_server_traces/ folder in the hope they will be useful. They were obtained in the same experimental campaign than the other data server traces, and have the same format. The difference is that these traces are for single-application scenarios.