Published June 3, 2026 | Version 1.0
Dataset Open

Replication materials for "Benchmarking Frontier Large Language Models on Consumer-Facing Cost Questions: Price-Range Width, Output Consistency, and a Structured-Engine Baseline"

Authors/Creators

Description

Anonymized data and reproduction code for the study. Includes the 40-question cross-sectional pass, the four deep repeated questions, the twenty-question repeated set for both general-purpose models, and a self-contained, standard-library-only script (score_reproduce.py) that recomputes every reported figure: price-range width medians and distribution, the over-charge marker counts, and the sample standard deviation across repeated runs. No network access or API keys are required. See README.md to run.

Files

benchmark_results_anonymized-4.csv

Files (914.6 kB)

Name Size Download all
md5:77e5c94ba14b8ca1c18e73f0b4016604
301.9 kB Preview Download
md5:7d750a3a87ad95d4ad813aba15d38fe7
2.5 kB Preview Download
md5:1c6fda7c5bfce41ae82d45e2de3585a9
465.4 kB Preview Download
md5:32e53a5a7925b4182e2a3c4236e22132
138.8 kB Preview Download
md5:76f3df11a0a8a98dd58934d5510e76fe
6.0 kB Download