Pythia8 and Herwig7 Boosted Top & QCD Jet datasets
Authors/Creators
Description
A dataset of labeled top and QCD jets, generated using both Pythia8 and Herwig.
There are 20 files: 10 files generated using Pythia, and 10 generated using Herwig (with the prefix `HERWIG`). Each file consists of 100k top jets and 100k QCD jets, for a total of 2M events for Pythia and 2M events for Herwig (4M total). There are two arrays in each file
- X: (200000,M,4), A set of 100k top jets and 100k QCD jets, where M is the max multiplicity of the jets in that file (other jets have been padded with zero-particles), and the features of each particle are its pt, rapidity, azimuthal angle, and pdgid.
- y: (200000,), an array of labels for the jets where QCD is 0 and top is 1.
The Pythia samples are generated using Pythia 8.331. The top events are generated using the processes `Top:gg2ttbar` and `Top:qqbar2ttbar`, and the W's are forced to decay hadronically. The QCD events are generated using `HardQCD:all`.
The Herwig samples are generated using Herwig 7.3.0. The top events are generated using `MEHeavyQuark`, and leptonic decays of the W's are discarded The QCD events are generated using `MEQCD2to2`.
For both datasets, jets are clustered using FastJet 3.3.0 using the anti-kt algorithm with R = 0.8. For top jets, a hard top parton is required to exist within the jet cone. We select for jets with a pT between 500 and 550 GeV and a pseudorapidity less than 2.5. If multiple jets in an event meet these criteria, one jet is chosen at random.
Usage
This dataset can be automatically and conveniently downloaded using the ParticleLoader python package. This will download to a specified cache, and load from the cache if the files already exist.
from particleloader import load
# Change this to a working directory on your machine!
dir = "~/.ParticleLoader"
N = 100000
X_pythia, y_pythia = load("topqcd_jets", N, cache_dir=dir)
X_herwig, y_herwig = load("topqcd_jets", N, cache_dir=dir, generator="herwig")
WARNING: A similar dataset exists for quark/gluon tagging. However, as these events were generated using different versions of Pythia and Herwig, these datasets should not be mixed.
Files
Files
(12.8 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:8305392fb9d9439892f2c6889a915f7c
|
640.8 MB | Download |
|
md5:b40f00a218a10f05381cacf5c18b5f32
|
640.8 MB | Download |
|
md5:f5a48cfbe662cd4acd8c88aa79e9716a
|
640.8 MB | Download |
|
md5:9874e1d71835e3398697f5c85ec87d53
|
640.8 MB | Download |
|
md5:56cf814a6eef31a5e5849d3ed50bd704
|
640.8 MB | Download |
|
md5:900e552c871a08c496be6ae942e86e33
|
640.8 MB | Download |
|
md5:9cf33a722f10f3c8aa534d46b9f62b7e
|
640.8 MB | Download |
|
md5:2790665b1f6446e2ac1e478d144f0f94
|
640.8 MB | Download |
|
md5:ac2d4776ea69f19a3a20af829b58dfdf
|
640.8 MB | Download |
|
md5:26e8b3360f425b5f113e656bf1d2f128
|
640.8 MB | Download |
|
md5:d0b38093b83dfecde6ae5fcd684432fc
|
640.8 MB | Download |
|
md5:9bf9a008074286ee6ba255af766c6c2c
|
640.8 MB | Download |
|
md5:b20fc1cf560d868e4dc82f069c9d6fe6
|
640.8 MB | Download |
|
md5:3c1fcabe3eeb683cf62a61c293d00209
|
640.8 MB | Download |
|
md5:2721a2955d2c30bdf366b227c4065e54
|
640.8 MB | Download |
|
md5:df1b6e4a018648dcc99c5a8d172b8790
|
640.8 MB | Download |
|
md5:df3496b3486995a72ca0b54df43d338e
|
640.8 MB | Download |
|
md5:90f2c40989ef6b055cffa5fc1bf439f3
|
640.8 MB | Download |
|
md5:1813136c6a50bf55c5abd4cffe1491d1
|
640.8 MB | Download |
|
md5:4f036e3576aa0f60dd0d3e88982bb6f2
|
640.8 MB | Download |