The QuoteSweep Stated-Appetite Corpus, v1.0

Shrestha, Ankurman

doi:10.5281/zenodo.20280436

Published May 19, 2026 | Version 1.0

Dataset Open

The QuoteSweep Stated-Appetite Corpus, v1.0

Shrestha, Ankurman (Project leader)

A public dataset of 509 U.S. commercial Property & Casualty (P&C) insurance carriers' publicly available stated-appetite materials, normalized into a single machine-readable JSON schema. Companion to the working paper *Observed Appetite: A Computational Framework for Measuring Commercial Insurance Carrier Underwriting Behavior at Distribution Scale* (Shrestha, 2026).

What this is

For each of 509 commercial P&C carriers, the corpus captures the carrier's publicly available appetite documentation – either a carrier-published PDF appetite guide (n=201) or a carrier-website appetite-page text scrape (n=308) – and parses it into a uniform JSON schema covering industry classes, lines of business, state availability, size thresholds, exclusions, and underwriting notes.

The corpus is the empirical substrate for two studies reported in §5 of the paper:

Analysis B (granularity gap) – the structural granularity of stated appetite across six coding dimensions (industry, state, size, exclusions, interactions, dates).
Analysis A (inter-source agreement) – the within-carrier agreement between a carrier's published PDF and its own appetite web page on which NAICS-2 sectors the carrier writes.

Sample and scope

509 carriers, 2,031 line-of-business rows, 9,526 appetite class rows
Scrape window: 2026-03-23 to 2026-04-06 (15 days)
Coverage: U.S. commercial P&C carriers with any publicly available appetite documentation

Headline findings derived from this corpus

Only 2.2% of carriers publicly disclose any industry × state × size interaction (95% CI 1.0–3.5%)
Only 1.2% annotate appetite at six-digit NAICS resolution
Only 4.7% of line-of-business commitments disclose a revenue threshold
Same-carrier PDF guides assert 2.14× more sector inclusions than the carrier's own appetite web page
Cohen's κ = +0.25 [95% CI 0.22, 0.28] between same-carrier PDF and web-page sources on NAICS-2 sector availability ("fair agreement" under Landis & Koch, 1977; n = 189 carriers, 3,780 cells)

Contents

The `appetite-corpus-v1.zip` archive preserves a nested directory structure. Top-level files contain the dataset (`carriers.json`, `sources.csv`, `codebook.md`, `corpus-schema.md`); the `reproduce/` subdirectory contains the reproducibility scripts and analytical audit trail (`compute_analysis_b_v2.py`, `compute_analysis_a_v2.py`, headline JSONs, raw input vectors, and verbatim D5 evidence quotes). Random seed for all bootstraps: `20260518`. See the standalone `README.md` for the full file inventory.

Limitations

The 509-carrier sample over-represents carriers that publish *any* appetite documentation; carriers absent from this corpus are absent because they publish nothing locatable. Headline disclosure rates should be interpreted as ceilings on the U.S. P&C population, not central estimates. Coding was performed by one LLM-assisted agent in a single session; verbatim evidence quotes are included in `reproduce/analysis-b-coding-v2.csv` to enable second-coder validation.

Citation

Shrestha, A. (2026). *The QuoteSweep Stated-Appetite Corpus, v1.0* [Data set]. Zenodo. https://doi.org/10.5281/zenodo.20280436

Files

appetite-corpus-v1.zip

Files (437.7 kB)

Name	Size	Download all
appetite-corpus-v1.zip md5:5b1c4170552eddecb58bef7f4a00f746	430.1 kB	Preview Download
README.md md5:2517cc06ae6f315aef8d60f34204634e	7.6 kB	Preview Download

Additional details

Collected: 2026-03-23/2026-04-06

Scrape window for carrier appetite source documents (PDF guides and appetite-page web text)

	All versions	This version
Views	3	3
Downloads	2	2
Data volume	437.7 kB	437.7 kB

The QuoteSweep Stated-Appetite Corpus, v1.0

Authors/Creators

Description

Files

appetite-corpus-v1.zip

Files (437.7 kB)

Additional details

Dates