Published November 3, 2019 | Version v1
Dataset Open

Jagged arrays in ROOT TTree, Parquet, and Avro

Authors/Creators

  • 1. ROR icon Princeton University

Description

This is a synthetic dataset of random numbers in variable-length, nested data structures in three file formats: ROOT TTree, Parquet, and Avro. There are four levels of depth:

  • jagged0: not nested; just a flat array of numbers
  • jagged1: an array of lists of numbers
  • jagged2: an array of lists of lists of numbers
  • jagged3: an array of lists of lists of lists of numbers

The TBasket sizes of the TTree files and the row group sizes of the Parquet files were made to be identical, so that performances can be meaningfully compared. All of the files are compressed with ZLIB level 9.

This dataset was first used in a performance study at CHEP 2019:

But it has since been used in other studies, such as this one at CHEP 2021:

and this one at ACAT 2022:

It has become a standard performance benchmark.

The scripts that were used to create this synthetic dataset are in this repository directory, PR #19.

Just one file, zlib9-jagged0.avro, had to be excluded to fit in this Zenodo record, but it is the easiest one to reconstruct from the others.

Files

Files (46.1 GB)

Name Size Download all
md5:8c002a5c8df8a85a1a3917b19261475f
4.0 GB Download
md5:534ed8e127582d1c438ca2af015d05be
4.0 GB Download
md5:a2e34acd7aeaa5b96b6fcbef0c5d066d
4.2 GB Download
md5:e8b3bf1b85b2af6490b614490b330ca3
4.0 GB Download
md5:23cc14ed87e81859428a49a9ae38669e
4.6 GB Download
md5:22aad359ec78567bb436d738fa2ea6bb
4.2 GB Download
md5:5628bccebf25c007b81ead45e58d989d
4.1 GB Download
md5:5b2ff11e195cc2a8d36fd388fc2eb1be
4.4 GB Download
md5:742c3312b50bfb6a904a92a2215b4758
4.3 GB Download
md5:bdc7aa88b6fd93277375cb9363a9fb2a
4.1 GB Download
md5:e4dc0ecff8e5791a2b4b821bdbad0867
4.3 GB Download

Additional details

Funding

U.S. National Science Foundation
S2I2: Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP) 1836650