Replication materials for "Benchmarking Frontier Large Language Models on Consumer-Facing Cost Questions: Price-Range Width, Output Consistency, and a Structured-Engine Baseline"

Oga, Toshikatsu

doi:10.5281/zenodo.20522847

Published June 3, 2026 | Version 1.0

Dataset Open

Replication materials for "Benchmarking Frontier Large Language Models on Consumer-Facing Cost Questions: Price-Range Width, Output Consistency, and a Structured-Engine Baseline"

Oga, Toshikatsu

Anonymized data and reproduction code for the study. Includes the 40-question cross-sectional pass, the four deep repeated questions, the twenty-question repeated set for both general-purpose models, and a self-contained, standard-library-only script (score_reproduce.py) that recomputes every reported figure: price-range width medians and distribution, the over-charge marker counts, and the sample standard deviation across repeated runs. No network access or API keys are required. See README.md to run.

Files

benchmark_results_anonymized-4.csv

Files (914.6 kB)

Name	Size	Download all
benchmark_results_anonymized-4.csv md5:77e5c94ba14b8ca1c18e73f0b4016604	301.9 kB	Preview Download
README.md md5:7d750a3a87ad95d4ad813aba15d38fe7	2.5 kB	Preview Download
repeat20_anonymized.csv md5:1c6fda7c5bfce41ae82d45e2de3585a9	465.4 kB	Preview Download
repeat_trials_anonymized.csv md5:32e53a5a7925b4182e2a3c4236e22132	138.8 kB	Preview Download
score_reproduce.py md5:76f3df11a0a8a98dd58934d5510e76fe	6.0 kB	Download

	All versions	This version
Views	10	10
Downloads	3	3
Data volume	905.7 kB	905.7 kB

Replication materials for "Benchmarking Frontier Large Language Models on Consumer-Facing Cost Questions: Price-Range Width, Output Consistency, and a Structured-Engine Baseline"

Authors/Creators

Description

Files

benchmark_results_anonymized-4.csv

Files (914.6 kB)