Published May 19, 2026 | Version v1.0.0
Software Open

Mark1999/latent-structure-benchmark: v1.0.0 — Initial public release

Authors/Creators

Description

Initial public release of the Latent Structure Benchmark (LSB).

LSB applies Cultural Domain Analysis (CDA) elicitation protocols — free listing, pile sorting, pile interview — to large language models as if they were informants. It surfaces the corpus lens: the latent categorical structure of a training corpus, refracted through training and alignment, made visible by structured elicitation.

LSB is not a capability benchmark, not a leaderboard, and not a ranking. This release includes:

  • The open-data bundle (CC0 1.0 Universal, 1.55 GB): https://huggingface.co/datasets/AILLM1999/latent-structure-benchmark
  • The reproducible build script (scripts/build_db.py) and full data dictionary (docs/DATA_DICTIONARY.md)
  • The dashboard at https://cogstructurelab.com
  • Every method-defining document under docs/ and ARCHITECTURE.md

Files

Mark1999/latent-structure-benchmark-v1.0.0.zip

Files (6.2 MB)

Name Size Download all
md5:cfe642424e81eac7d6315fc75172d2f4
6.2 MB Preview Download

Additional details

Related works