Published May 19, 2026 | Version 1.0
Dataset Open

The QuoteSweep Stated-Appetite Corpus, v1.0

Description

A public dataset of 509 U.S. commercial Property & Casualty (P&C) insurance carriers' publicly available stated-appetite materials, normalized into a single machine-readable JSON schema. Companion to the working paper *Observed Appetite: A Computational Framework for Measuring Commercial Insurance Carrier Underwriting Behavior at Distribution Scale* (Shrestha, 2026).

What this is

For each of 509 commercial P&C carriers, the corpus captures the carrier's publicly available appetite documentation – either a carrier-published PDF appetite guide (n=201) or a carrier-website appetite-page text scrape (n=308) – and parses it into a uniform JSON schema covering industry classes, lines of business, state availability, size thresholds, exclusions, and underwriting notes.

The corpus is the empirical substrate for two studies reported in §5 of the paper:

  • Analysis B (granularity gap) – the structural granularity of stated appetite across six coding dimensions (industry, state, size, exclusions, interactions, dates).
  • Analysis A (inter-source agreement) – the within-carrier agreement between a carrier's published PDF and its own appetite web page on which NAICS-2 sectors the carrier writes.

Sample and scope

  • 509 carriers, 2,031 line-of-business rows, 9,526 appetite class rows
  • Scrape window: 2026-03-23 to 2026-04-06 (15 days)
  • Coverage: U.S. commercial P&C carriers with any publicly available appetite documentation

Headline findings derived from this corpus

  • Only 2.2% of carriers publicly disclose any industry × state × size interaction (95% CI 1.0–3.5%)
  • Only 1.2% annotate appetite at six-digit NAICS resolution
  • Only 4.7% of line-of-business commitments disclose a revenue threshold
  • Same-carrier PDF guides assert 2.14× more sector inclusions than the carrier's own appetite web page
  • Cohen's κ = +0.25 [95% CI 0.22, 0.28] between same-carrier PDF and web-page sources on NAICS-2 sector availability ("fair agreement" under Landis & Koch, 1977; n = 189 carriers, 3,780 cells)

Contents

The `appetite-corpus-v1.zip` archive preserves a nested directory structure. Top-level files contain the dataset (`carriers.json`, `sources.csv`, `codebook.md`, `corpus-schema.md`); the `reproduce/` subdirectory contains the reproducibility scripts and analytical audit trail (`compute_analysis_b_v2.py`, `compute_analysis_a_v2.py`, headline JSONs, raw input vectors, and verbatim D5 evidence quotes). Random seed for all bootstraps: `20260518`. See the standalone `README.md` for the full file inventory.

Limitations

The 509-carrier sample over-represents carriers that publish *any* appetite documentation; carriers absent from this corpus are absent because they publish nothing locatable. Headline disclosure rates should be interpreted as ceilings on the U.S. P&C population, not central estimates. Coding was performed by one LLM-assisted agent in a single session; verbatim evidence quotes are included in `reproduce/analysis-b-coding-v2.csv` to enable second-coder validation.

Citation

Shrestha, A. (2026). *The QuoteSweep Stated-Appetite Corpus, v1.0* [Data set]. Zenodo. https://doi.org/10.5281/zenodo.20280436

Files

appetite-corpus-v1.zip

Files (437.7 kB)

Name Size Download all
md5:5b1c4170552eddecb58bef7f4a00f746
430.1 kB Preview Download
md5:2517cc06ae6f315aef8d60f34204634e
7.6 kB Preview Download

Additional details

Dates

Collected
2026-03-23/2026-04-06
Scrape window for carrier appetite source documents (PDF guides and appetite-page web text)