Published May 4, 2026
| Version v1
Dataset
Open
WhyLab Gemini 2.5 Flash Docker Validation (Honest Null Result
Description
[Metadata anonymized 2026-05-18 for privacy/blind-review hygiene. Permanent deletion requested via Zenodo Support. Original creator credit retained in private deposit history.]
402-episode (67 problems × 3 seeds × 2 conditions) Docker ground-truth validation of WhyLab causal-audit C2 vs baseline on Gemini 2.5 Flash. Published as an honest null result — adaptive C2 did not outperform fixed C2 on the SWE-bench slice — to support calibration claims rather than overclaim. Companion to [redacted venue] WhyLab [redacted venue]. Hugging Face: neogenesislab/whylab-gemini-2-5-docker-validation.
Files
metadata.json
Files
(963 Bytes)
| Name | Size | Download all |
|---|---|---|
|
md5:695c810a304ef48b507f53e134e922d5
|
963 Bytes | Preview Download |