Published May 7, 2026 | Version 1.0.0 (initial release)
Publication Open

Behavioral Disclosure in LLM-Mediated Bilateral Trade: A Theoretical Framework with Empirical Calibration in Hotel Dynamic Pricing

Authors/Creators

  • 1. International center for computational engineering
  • 2. Agel AI

Description

This paper develops a theoretical framework for bilateral bargaining mediated by large language models (LLMs) and supplements it with the largest published cross-model empirical study of LLM-mediated bilateral trade to date. The Myerson–Satterthwaite (1983) impossibility theorem rules out efficient, incentive-compatible, individually rational, and budget-balanced mechanisms for bilateral trade between strategic agents. We introduce a disclosure-rate parameter α and derive closed-form efficiency curves under three hypothesised behavioural modes (binary, continuous, noisy), interpolating between the Chatterjee–Samuelson Bayes–Nash second-best (≈0.844) and the first-best. The framework is then tested empirically across five experimental phases on ten frontier LLMs (Claude Opus 4.7, Claude Sonnet 4.6, GPT-5.5, Gemini 3 Flash, DeepSeek V4 Pro, Grok 4.3, Kimi, Qwen, Gemma) accessed through OpenRouter, totaling approximately 4,320 dialogues and roughly $70 in API spend. Key empirical findings (combined n=60 per cell):

  - Phase 1 (one-shot disclosure): Nine of ten models systematically refuse to disclose reservation values in 60–98% of trials, falsifying the binary/continuous predictions of   the framework.

  - Phase 2 (multi-turn K=5, abstract domain): Cross-model heterogeneity is overwhelming under identical protocol,  Gemini-Flash 0.924, Claude-Sonnet 0.907, GPT-5.5 0.667,   DeepSeek 0.293, Grok 0.168, Claude-Opus exactly 0/60 (Wilson 95% CI [0%, 6%]). Pearson chi-square against trade-rate homogeneity: χ² = 61.19, p = 6.9 × 10⁻¹².

  - Phase 4 (asymmetric framing): Role asymmetry partially unblocks structural refusal, Claude-Sonnet reaches 0.994 (95% CI [0.977, 1.000], cleanly excluding the CS bound),  Grok triples to 0.619, Claude-Opus partially recovers to 0.367.

  - Phase 5 (real hotel B2B in EUR, HJB-derived costs): Claude-Sonnet 0.998 (95% CI [0.996, 1.000]) cleanly excludes the naive posted-price baseline of 0.931, the strongest  "LLM beats posted-price" result. GPT-5.5 collapses from 0.667 abstract to 0.165 in domain (Fisher p = 6.1 × 10⁻⁷). Cross-model omnibus χ² = 76.69, p = 1.6 × 10⁻¹⁶.

 Central thesis: Model identity dominates protocol design in LLM bargaining. The same protocol with sibling models produces 0.91 vs 0.00 efficiency. Mechanism design for  LLM-mediated bilateral trade must be model-aware, and pre-deployment screening must include domain-specific testing,  abstract-benchmark performance does not transfer.

 

Files

paper_Behavioral_Disclosure_in_LLM_Mediated_Bilateral_Trade.pdf

Files (899.4 kB)

Name Size Download all
md5:f58dfe130a4565c8ce21ac79dd343808
114.1 kB Preview Download
md5:b9d8b047cf8528a66af0f6ce69da0be0
459.6 kB Preview Download
md5:0104b8b65139eca7d976df433794dcb7
325.7 kB Preview Download

Additional details

Software

Programming language
Python