The QuoteSweep Stated-Appetite Corpus, v1.0
Authors/Creators
Description
A public dataset of 509 U.S. commercial Property & Casualty (P&C) insurance carriers' publicly available stated-appetite materials, normalized into a single machine-readable JSON schema. Companion to the working paper *Observed Appetite: A Computational Framework for Measuring Commercial Insurance Carrier Underwriting Behavior at Distribution Scale* (Shrestha, 2026).
What this is
For each of 509 commercial P&C carriers, the corpus captures the carrier's publicly available appetite documentation – either a carrier-published PDF appetite guide (n=201) or a carrier-website appetite-page text scrape (n=308) – and parses it into a uniform JSON schema covering industry classes, lines of business, state availability, size thresholds, exclusions, and underwriting notes.
The corpus is the empirical substrate for two studies reported in §5 of the paper:
- Analysis B (granularity gap) – the structural granularity of stated appetite across six coding dimensions (industry, state, size, exclusions, interactions, dates).
- Analysis A (inter-source agreement) – the within-carrier agreement between a carrier's published PDF and its own appetite web page on which NAICS-2 sectors the carrier writes.
Sample and scope
- 509 carriers, 2,031 line-of-business rows, 9,526 appetite class rows
- Scrape window: 2026-03-23 to 2026-04-06 (15 days)
- Coverage: U.S. commercial P&C carriers with any publicly available appetite documentation
Headline findings derived from this corpus
- Only 2.2% of carriers publicly disclose any industry × state × size interaction (95% CI 1.0–3.5%)
- Only 1.2% annotate appetite at six-digit NAICS resolution
- Only 4.7% of line-of-business commitments disclose a revenue threshold
- Same-carrier PDF guides assert 2.14× more sector inclusions than the carrier's own appetite web page
- Cohen's κ = +0.25 [95% CI 0.22, 0.28] between same-carrier PDF and web-page sources on NAICS-2 sector availability ("fair agreement" under Landis & Koch, 1977; n = 189 carriers, 3,780 cells)
Contents
The `appetite-corpus-v1.zip` archive preserves a nested directory structure. Top-level files contain the dataset (`carriers.json`, `sources.csv`, `codebook.md`, `corpus-schema.md`); the `reproduce/` subdirectory contains the reproducibility scripts and analytical audit trail (`compute_analysis_b_v2.py`, `compute_analysis_a_v2.py`, headline JSONs, raw input vectors, and verbatim D5 evidence quotes). Random seed for all bootstraps: `20260518`. See the standalone `README.md` for the full file inventory.
Limitations
The 509-carrier sample over-represents carriers that publish *any* appetite documentation; carriers absent from this corpus are absent because they publish nothing locatable. Headline disclosure rates should be interpreted as ceilings on the U.S. P&C population, not central estimates. Coding was performed by one LLM-assisted agent in a single session; verbatim evidence quotes are included in `reproduce/analysis-b-coding-v2.csv` to enable second-coder validation.
Citation
Shrestha, A. (2026). *The QuoteSweep Stated-Appetite Corpus, v1.0* [Data set]. Zenodo. https://doi.org/10.5281/zenodo.20280436
Files
appetite-corpus-v1.zip
Files
(437.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:5b1c4170552eddecb58bef7f4a00f746
|
430.1 kB | Preview Download |
|
md5:2517cc06ae6f315aef8d60f34204634e
|
7.6 kB | Preview Download |
Additional details
Dates
- Collected
-
2026-03-23/2026-04-06Scrape window for carrier appetite source documents (PDF guides and appetite-page web text)