Behavioral Disclosure in LLM-Mediated Bilateral Trade: A Theoretical Framework with Empirical Calibration in Hotel Dynamic Pricing

Drakos, Stefanos

doi:10.5281/zenodo.20075690

Published May 7, 2026 | Version 1.0.0 (initial release)

Publication Open

Behavioral Disclosure in LLM-Mediated Bilateral Trade: A Theoretical Framework with Empirical Calibration in Hotel Dynamic Pricing

Drakos, Stefanos^{1, 2}

1. International center for computational engineering
2. Agel AI

This paper develops a theoretical framework for bilateral bargaining mediated by large language models (LLMs) and supplements it with the largest published cross-model empirical study of LLM-mediated bilateral trade to date. The Myerson–Satterthwaite (1983) impossibility theorem rules out efficient, incentive-compatible, individually rational, and budget-balanced mechanisms for bilateral trade between strategic agents. We introduce a disclosure-rate parameter α and derive closed-form efficiency curves under three hypothesised behavioural modes (binary, continuous, noisy), interpolating between the Chatterjee–Samuelson Bayes–Nash second-best (≈0.844) and the first-best. The framework is then tested empirically across five experimental phases on ten frontier LLMs (Claude Opus 4.7, Claude Sonnet 4.6, GPT-5.5, Gemini 3 Flash, DeepSeek V4 Pro, Grok 4.3, Kimi, Qwen, Gemma) accessed through OpenRouter, totaling approximately 4,320 dialogues and roughly $70 in API spend. Key empirical findings (combined n=60 per cell):

- Phase 1 (one-shot disclosure): Nine of ten models systematically refuse to disclose reservation values in 60–98% of trials, falsifying the binary/continuous predictions of the framework.

- Phase 2 (multi-turn K=5, abstract domain): Cross-model heterogeneity is overwhelming under identical protocol, Gemini-Flash 0.924, Claude-Sonnet 0.907, GPT-5.5 0.667, DeepSeek 0.293, Grok 0.168, Claude-Opus exactly 0/60 (Wilson 95% CI [0%, 6%]). Pearson chi-square against trade-rate homogeneity: χ² = 61.19, p = 6.9 × 10⁻¹².

- Phase 4 (asymmetric framing): Role asymmetry partially unblocks structural refusal, Claude-Sonnet reaches 0.994 (95% CI [0.977, 1.000], cleanly excluding the CS bound), Grok triples to 0.619, Claude-Opus partially recovers to 0.367.

- Phase 5 (real hotel B2B in EUR, HJB-derived costs): Claude-Sonnet 0.998 (95% CI [0.996, 1.000]) cleanly excludes the naive posted-price baseline of 0.931, the strongest "LLM beats posted-price" result. GPT-5.5 collapses from 0.667 abstract to 0.165 in domain (Fisher p = 6.1 × 10⁻⁷). Cross-model omnibus χ² = 76.69, p = 1.6 × 10⁻¹⁶.

Central thesis: Model identity dominates protocol design in LLM bargaining. The same protocol with sibling models produces 0.91 vs 0.00 efficiency. Mechanism design for LLM-mediated bilateral trade must be model-aware, and pre-deployment screening must include domain-specific testing, abstract-benchmark performance does not transfer.

Files

paper_Behavioral_Disclosure_in_LLM_Mediated_Bilateral_Trade.pdf

Files (899.4 kB)

Name	Size	Download all
Frontistirio_Bilateral_Trade_Paper.pdf md5:f58dfe130a4565c8ce21ac79dd343808	114.1 kB	Preview Download
paper_Behavioral_Disclosure_in_LLM_Mediated_Bilateral_Trade.pdf md5:b9d8b047cf8528a66af0f6ce69da0be0	459.6 kB	Preview Download
Tutorial_Beginners_Bilateral_Trade_Paper.pdf md5:0104b8b65139eca7d976df433794dcb7	325.7 kB	Preview Download

Additional details

Programming language: Python

	All versions	This version
Views	24	24
Downloads	19	19
Data volume	10.8 MB	10.8 MB

Behavioral Disclosure in LLM-Mediated Bilateral Trade: A Theoretical Framework with Empirical Calibration in Hotel Dynamic Pricing

Authors/Creators

Description

Files

paper_Behavioral_Disclosure_in_LLM_Mediated_Bilateral_Trade.pdf

Files (899.4 kB)

Additional details

Software