Published May 4, 2026 | Version v1
Dataset Open

WhyLab Gemini 2.5 Flash Docker Validation (Honest Null Result

Authors/Creators

  • 1. Anonymous

Description

[Metadata anonymized 2026-05-18 for privacy/blind-review hygiene. Permanent deletion requested via Zenodo Support. Original creator credit retained in private deposit history.]

402-episode (67 problems × 3 seeds × 2 conditions) Docker ground-truth validation of WhyLab causal-audit C2 vs baseline on Gemini 2.5 Flash. Published as an honest null result — adaptive C2 did not outperform fixed C2 on the SWE-bench slice — to support calibration claims rather than overclaim. Companion to [redacted venue] WhyLab [redacted venue]. Hugging Face: neogenesislab/whylab-gemini-2-5-docker-validation.

Files

metadata.json

Files (963 Bytes)

Name Size Download all
md5:695c810a304ef48b507f53e134e922d5
963 Bytes Preview Download