AriaSQL: Production SQL Agent for 100+ Table Databases with SQLAS Evaluation
Authors/Creators
Description
Enterprise databases routinely contain hundreds to thousands of tables, yet existing Text-to-SQL systems were designed for
academic benchmarks with small schemas (Spider: avg 5.3 tables/DB). Injecting a full 200-table schema into a single LLM
prompt requires ~41,300 tokens per query, a 21x cost multiplier that is economically infeasible at production scale.
Equally critical, no standardised evaluation framework exists for SQL agents: existing frameworks measure only binary
execution accuracy, leaving safety, quality, and schema retrieval unmeasured.
We present two complementary systems. AriaSQL is a production full-stack SQL agent (ReactJS + FastAPI) that handles 100+
table schemas through a four-layer adaptive retrieval pipeline (BM25 sparse retrieval, dense embedding retrieval,
Reciprocal Rank Fusion, and FK-graph expansion), reducing prompt token usage by 95.3% versus full-schema injection. SQLAS
is the first standardised evaluation framework for SQL agents, providing SQL-specific metrics no existing framework
offers: execution accuracy, schema retrieval F1, a three-dimension AND-logic verdict (correctness/quality/safety), and 15
named failure categories with actionable remediation hints.
On LargeSchemaEval (50/100/200 table scales), AriaSQL achieves 72.5% execution accuracy at 107 tables with a 76.0% PASS
rate (LLM-judge evaluation, quality mean 0.893). On BIRD dev set (238 questions, zero-shot), AriaSQL achieves 42.4%
execution accuracy with 86.0% schema retrieval F1, matching the GPT-4o zero-shot baseline without BIRD-specific
fine-tuning. All 3,686 evaluated queries confirm 100% read-only compliance across all retrieval modes.
Files
ICDE2027_AriaSQL_SQLAS.pdf
Files
(254.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:75027721c5ecbd71c76df7a7bf56d4cd
|
254.7 kB | Preview Download |