Published March 4, 2026 | Version 1.0.0
Dataset Open

Pact Benchmark: ICPC World Finals — Contract-First Multi-Agent vs Single-Agent Code Generation

Authors/Creators

  • 1. Cage & Mirror Press

Description

Benchmark comparing Pact (contract-first multi-agent framework) against Claude Code on 5 ICPC World Finals competitive programming problems (212 test cases). Pact achieves 100% (212/212) vs Claude Code single-shot 79% (167/212) and iterative 92% (196/212). All conditions use Claude Opus 4.6. Includes test data, baseline results, full Pact state for both conditions (research and base), and reproduction scripts. The decisive problem is Trailing Digits (2020 World Finals): Claude Code scores 31/47 even with 5 retry iterations — the naive algorithm times out. Pact's interview and decomposition phases force upfront mathematical analysis, producing the correct O(log n) approach on the first attempt.

Files

Files (36.9 MB)

Name Size Download all
md5:9452285c460280e27ad5112aba6788ea
36.9 MB Download

Additional details

Related works

Is part of
Software: https://github.com/jmcentire/pact (URL)
Is supplement to
Software: https://github.com/jmcentire/pact-bench (URL)