First Documented Proof of Cross-Vendor AI Collaboration on a Benchmark: Multi-AI Consensus Achieves Best-Ever 50% on IMO 2025
Contributors
Contact person:
Description
First documented instance of multiple frontier AI systems from different vendors
(Claude, GPT-4, Grok, Gemini, DeepSeek, Kimi) collaborating in real-time to solve mathematical olympiad problems. Achieved 50% accuracy (3/6 problems correct) on IMO 2025, an 18.4 percentage point improvement over Gemini baseline. Notably, Gemini alone solved 0/6 problems in our trials, with all three correct answers emerging from cross-AI collaboration and consensus voting. This work demonstrates that multi-vendor AI collaboration can exceed individual model performance on the hardest mathematical reasoning benchmarks, and introduces a novel "Family Game Night" protocol for fallback reasoning when primary models fail.
Files
files.zip
Files
(25.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:18cae9fb57af2f2707479e983e9faded
|
25.1 kB | Preview Download |
Additional details
Dates
- Issued
-
2025-11-29HyperNet