mcpbr: Benchmarking Model Context Protocol Servers on Software Engineering Tasks
Description
The Model Context Protocol (MCP) lets developers expose tools and data sources to LLM-based agents through a standardized interface. Despite rapid ecosystem growth, no methodology exists for evaluating whether a given MCP server improves agent task completion. We present mcpbr, an open-source benchmark runner that isolates the effect of MCP tool augmentation through paired comparison experiments. We evaluate a code graph analysis MCP server on all 500 tasks from SWE-bench Verified using Claude Sonnet as the base agent. MCP augmentation reduced resolution rate by 14.9% (from 49.8% to 42.4%) while improving efficiency: 42.3% fewer tool calls, 14.0% fewer tokens, and 15.2% lower cost. Per-repository analysis shows the effect varies across codebases, with the server helping on 1 of 12 repositories and hurting on 10. We analyze this efficiency-resolution tradeoff and show that MCP tools alter the agent's exploration strategy, trading general-purpose search for opinionated shortcuts that can narrow the solution space.
Files
main.pdf
Files
(287.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:de1efde19dc2e67e714b2269035d4d41
|
287.9 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/greynewell/mcpbr
- Programming language
- Python
- Development Status
- Active