There is a newer version of the record available.

Published April 27, 2026 | Version 1.1.0
Preprint Open

Sverklo: A Local-First Code Intelligence MCP Server and a Cross-Repository Software Engineering Benchmark

  • 1. Independent Researcher

Description

v1.0.1: Corrected author email; added LaTeX source bundle for full reproducibility. 

v1.1.0: Adds bench:swe cross-repository results across 5 OSS repos (§V.D, Table III). 38/65 perfect recall, 66.2% mean. Includes failure-pattern analysis.

Code-intelligence retrieval has become a primary bottleneck for AI coding agents on real-world repositories, yet existing Model Context Protocol (MCP) servers force a choice between sending source to a hosted service (Greptile, Cursor, Sourcegraph Cody), depending on language-specific servers (Serena via the Language Server Protocol), or falling back to flat lexical search (ripgrep, ctags). We present Sverklo, an open-source code-intelligence server that combines incremental tree-sitter parsing, a hybrid retriever fusing BM25, dense embeddings (MiniLM-L6, 384-d), and PageRank over a symbol-and-import graph via channelized Reciprocal Rank Fusion, and a bi-temporal memory store pinned to git commits. All computation runs locally; the package installs in one command and indexes 175 TypeScript files in approximately three seconds. We further introduce bench:swe, a reproducible cross-repository evaluation harness spanning 65 hand-curated research questions across five popular open-source projects (Express, NestJS, Vite, Prisma, FastAPI), and bench:primitives, a deterministic 60-task suite measuring recall, precision, and token efficiency on symbol lookup, reference finding, dependency analysis, and dead-code detection. On bench:primitives, Sverklo achieves an aggregate F1 of 0.58 while consuming an average of 255 input tokens per task — a 65% reduction over a tuned grep baseline (0.67 F1, 731 tokens) and a 98% reduction over naive grep (0.35 F1, 15,814 tokens). Sverklo wins decisively on definition lookup and dependency analysis but loses on reference finding and dead-code detection; the negative results are reported transparently. The system, harness, and 75 ground-truth files are released under MIT license at https://github.com/sverklo/sverklo to enable third-party reproduction and comparison.

Files

paper.pdf

Files (280.7 kB)

Name Size Download all
md5:6d8d119f7d461051858e79a0b977d8be
253.8 kB Preview Download
md5:cec1c137b7b586ef853b6844250c39a9
26.9 kB Download

Additional details

Related works

Documents
Software: https://github.com/sverklo/sverklo (URL)
Is supplemented by
Other: https://www.sverklo.com (URL)

Software

Repository URL
https://github.com/sverklo/sverklo
Programming language
TypeScript , JavaScript
Development Status
Active