There is a newer version of the record available.

Published April 29, 2026 | Version v1
Dataset Open

SigMap Benchmark Suite: 240-Repository Large-Scale AI Context Extraction Dataset

Authors/Creators

Description

The SigMap Benchmark Suite presents a comprehensive evaluation of AI context extraction across 240 diverse open-source repositories spanning 30+ programming languages. This dataset comprises 1,775 benchmark operations capturing token reduction metrics, execution performance, and code complexity analysis.

Key Features:
• 240 repositories across 30+ languages
• 1,775 benchmark operations (5 modes per repository)
• 50+ metadata fields per repository
• 96.2% average token reduction
• Complete reproducibility package
• 4 data export formats (CSV, JSON, JSONL, SQL)

The dataset enables analysis of language-specific context extraction patterns, monorepo complexity, domain-specific compression characteristics, and AI context optimization strategies.

Complete methodology, reproducibility materials, and scripts are included.

Files

Dataset_Paper.md

Files (943.0 kB)

Name Size Download all
md5:5677bb3c8910e4e17e334b8963a70846
11.5 kB Download
md5:c3b94888c5d5f9f7269cfc22b4fcc987
22.5 kB Download
md5:8048853b5bfc8a651dea096fe8634f1c
26.6 kB Download
md5:722afa7bf9d2644db3aba029ea223d17
7.1 kB Download
md5:9d535e3f62d5b1996a0252cc419e6c0e
9.8 kB Download
md5:d9c290d9cfa425ac6a9ee81e8534cdd1
8.6 kB Download
md5:52b703452f77fbdec4c1bc55cb6c90b6
4.2 kB Download
md5:000391ac0333c148347697d3e50a6103
17.3 kB Download
md5:6cbeb79cd6377d928c2228887a741691
2.9 kB Download
md5:9ac1a2702a53bb230f53412c4505286b
16.8 kB Preview Download
md5:0eeca47f09ee3f1a2797956b75dcb870
863 Bytes Download
md5:44a7051a72199e44c855b9321d19edde
4.7 kB Preview Download
md5:e9cda9c43bc7e1714f9f1a26061cd360
5.7 kB Download
md5:82ab7f6fc295c77d9aed2bd7311ea88b
1.4 kB Preview Download
md5:6e5d9be419f4fed3cbd719d86641e904
6.4 kB Preview Download
md5:6da468de0d3f7f41b717240cb5576fad
27.2 kB Preview Download
md5:2626b56ba054f518408bcf58c07bb75c
1.4 kB Download
md5:4cef45ed34154262a462041de8dc8f67
50.2 kB Preview Download
md5:295003109c0f5af355810ec3fd79bc1d
350.5 kB Preview Download
md5:d6b9853b4fa6fc685e31a9b56d2d6400
277.8 kB Download
md5:82018927f57a3af509a148882fdb4f97
89.6 kB Download