Jagged arrays in ROOT TTree, Parquet, and Avro
Description
This is a synthetic dataset of random numbers in variable-length, nested data structures in three file formats: ROOT TTree, Parquet, and Avro. There are four levels of depth:
- jagged0: not nested; just a flat array of numbers
- jagged1: an array of lists of numbers
- jagged2: an array of lists of lists of numbers
- jagged3: an array of lists of lists of lists of numbers
The TBasket sizes of the TTree files and the row group sizes of the Parquet files were made to be identical, so that performances can be meaningfully compared. All of the files are compressed with ZLIB level 9.
This dataset was first used in a performance study at CHEP 2019:
But it has since been used in other studies, such as this one at CHEP 2021:
and this one at ACAT 2022:
- presentation page
- preprint (will be published)
It has become a standard performance benchmark.
The scripts that were used to create this synthetic dataset are in this repository directory, PR #19.
Just one file, zlib9-jagged0.avro, had to be excluded to fit in this Zenodo record, but it is the easiest one to reconstruct from the others.
Files
Files
(46.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:8c002a5c8df8a85a1a3917b19261475f
|
4.0 GB | Download |
|
md5:534ed8e127582d1c438ca2af015d05be
|
4.0 GB | Download |
|
md5:a2e34acd7aeaa5b96b6fcbef0c5d066d
|
4.2 GB | Download |
|
md5:e8b3bf1b85b2af6490b614490b330ca3
|
4.0 GB | Download |
|
md5:23cc14ed87e81859428a49a9ae38669e
|
4.6 GB | Download |
|
md5:22aad359ec78567bb436d738fa2ea6bb
|
4.2 GB | Download |
|
md5:5628bccebf25c007b81ead45e58d989d
|
4.1 GB | Download |
|
md5:5b2ff11e195cc2a8d36fd388fc2eb1be
|
4.4 GB | Download |
|
md5:742c3312b50bfb6a904a92a2215b4758
|
4.3 GB | Download |
|
md5:bdc7aa88b6fd93277375cb9363a9fb2a
|
4.1 GB | Download |
|
md5:e4dc0ecff8e5791a2b4b821bdbad0867
|
4.3 GB | Download |
Additional details
Funding
- U.S. National Science Foundation
- S2I2: Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP) 1836650