Published June 3, 2026
| Version 1.0
Dataset
Open
Replication materials for "Benchmarking Frontier Large Language Models on Consumer-Facing Cost Questions: Price-Range Width, Output Consistency, and a Structured-Engine Baseline"
Authors/Creators
Description
Anonymized data and reproduction code for the study. Includes the 40-question cross-sectional pass, the four deep repeated questions, the twenty-question repeated set for both general-purpose models, and a self-contained, standard-library-only script (score_reproduce.py) that recomputes every reported figure: price-range width medians and distribution, the over-charge marker counts, and the sample standard deviation across repeated runs. No network access or API keys are required. See README.md to run.
Files
benchmark_results_anonymized-4.csv
Files
(914.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:77e5c94ba14b8ca1c18e73f0b4016604
|
301.9 kB | Preview Download |
|
md5:7d750a3a87ad95d4ad813aba15d38fe7
|
2.5 kB | Preview Download |
|
md5:1c6fda7c5bfce41ae82d45e2de3585a9
|
465.4 kB | Preview Download |
|
md5:32e53a5a7925b4182e2a3c4236e22132
|
138.8 kB | Preview Download |
|
md5:76f3df11a0a8a98dd58934d5510e76fe
|
6.0 kB | Download |