Published May 8, 2026 | Version v1
Dataset Open

SAGEBench: Simulated Bruker timsTOF .d fixtures and ground-truth FDR/TPR benchmark for the Sage DDA search engine

Authors/Creators

Description

Simulated Bruker .d fixtures and a benchmark harness for the Sage DDA database-search engine.

Two purposes:

  1. CI-grade Bruker test data for Sage so regressions in the timsTOF code path get caught before release (motivated by lazear/sage#228).
  2. A larger ground-truth-backed evaluation set so anyone can compute true FDR / TPR for Sage on simulated DDA data, in the spirit of timsim-bench for DIA.

All datasets generated with TimSim; ground truth is exact (every injected peptide is recorded in synthetic_data.db alongside each .d).

Files in this record:

  • sagebench-ci-smoke.tar.gz (~457 MB) — two 5-min HeLa .d files, seed CSV, configs, regen script. Drop-in CI fixture.
  • sagebench-hela-150k-g30m.tar.gz (~3.6 GB) — HeLa, 150 000 peptides, 30-min gradient (rep 001).
  • sagebench-hla-10k-g40.tar.gz (~2.6 GB) — HLA Thunder, 10 000 peptides, 40-min gradient, 3 replicates.
  • sagebench-hla-100k-g3600.tar.gz (~6.6 GB) — HLA Thunder, 100 000 peptides, 60-min gradient, 3 replicates.
  • sagebench-results.tar.gz (~288 KB) — first-run report (REPORT.html, RESULTS.md, eval CSVs) against Sage 0.15.0-beta.2.

Each archive contains its own README.md with usage instructions. The SAGEBench repository (github.com/theGreatHerrLebert/SAGEBench) hosts the harness used to score search-engine output against the recorded ground truth.

Files

Files (13.9 GB)

Name Size Download all
md5:f2c9bcf435f80e0d23cbd74675d2e220
478.5 MB Download
md5:d6315e9b65ab3266c780cd20a56c412e
3.8 GB Download
md5:82a92e4525f8e0ecfb8651fd6bb15403
7.0 GB Download
md5:7a75077c27813cf584de52ae353c1a7e
2.7 GB Download
md5:6263dbae22bd79da4c281af0044aa271
294.3 kB Download

Additional details