Published March 31, 2026 | Version v1
Dataset Open

TAMM Simulated di-Higgs Dataset

  • 1. ROR icon International Center for Advanced Studies
  • 2. ROR icon National University of General San Martín
  • 3. ROR icon Consejo Nacional de Investigaciones Científicas y Técnicas
  • 4. EDMO icon Massachusetts Institute of Technology
  • 5. ROR icon The NSF AI Institute for Artificial Intelligence and Fundamental Interactions

Description

Simulated signal (di-Higgs to 4b) and background (other 4b) jets used in the paper "Many Wrongs Make a Right: Leveraging Biased Simulations Towards Unbiased Parameter Inference". The "_sd" files correspond to the TDs described in the paper, and the "_ssd" files to zipped archives containing the 500 MSD populations.

The jets are generated using ${\tt MadGraph5\_aMC@NLO}$, and Higgs decay is performed with ${\tt MadSpin}$. We use ${\tt Pythia~8}$ for parton showering and hadronization, followed by ${\tt Delphes 3}$ for fast detector simulation. We use a modified CMS card for the detector simulation, reconstructing jets with the anti-$k_T$ algorithm with $R=0.8$ and demanding $p_{T_j} > 8\;{\rm GeV}$.

The background generation is biased to enhance efficiency by requiring greater than a $90$ GeV mass for each $b \bar{b}$ pair.

Each of the jets is then subjected to further kinematic cuts, $p_T>25$ GeV and $|\eta|<2.5$, and events with at least four surviving jets are selected. Higgs candidate dijets are then constructed by minimizing the metric

$\chi^2 = \frac{(m_1-125\,\mathrm{GeV})^2}{\sigma^2} + \frac{(m_2-125\,\mathrm{GeV})^2}{\sigma^2}$

as a function of the dijet masses $m_1$ and $m_2$. The event is accepted if both dijet masses are within $[100,150]$ GeV, and the dataset consists of these dijet masses for each accepted event.

The TD and MSD datasets differ through the Jet Energy Scale (JES) in the detector simulation, with the TD setting the CMS card JES parameters to their default values and the MSDs modifying these parameters in the manner described in the article.

The code used to produce the datasets can be found in the GitHub repository associated with the paper. The file results.tar.gz are results for Bᴀʏᴇꜱɪᴀɴ Tᴏᴘɪᴄ Mᴏᴅᴇʟɪɴɢ, to be used with the code in the GitHub repository.

Files

Files (6.5 GB)

Name Size Download all
md5:309f53c74af0498ec14f3af153d6d879
2.8 GB Download
md5:8658eacecbc7f8b63f08ce57a99cddda
40.5 MB Download
md5:8db884c65dacd70d456bda3b81bf3e1f
7.1 MB Download
md5:4bd4c64319ce3399fe8d73fc583a008f
491.5 MB Download
md5:071dfdd52f71f6c1443c4ba9f9c4d218
3.1 GB Download

Additional details

Software