Provenance Erasure Rate: A Compression-Survival Metric for Attribution Loss in AI-Composed Search Outputs
Description
Research note and metric proposal. AI retrieval systems increasingly compose answers from human-authored sources. This paper introduces Provenance Erasure Rate (PER) as a metric measuring the proportion of source-dependent claims in an AI-composed output that are presented without explicit attribution. PER does not ask whether an output is true; it asks whether the sources that made the output possible remain visible inside the composition.
A motivating case study documents a Google AI Overview that constructed a false biography of a living author from real fragments in the author's published poetry: every fragment survived compression, but their provenance and meaning did not. PER for this output = 1.0 (total provenance erasure).
PER is formalized with claim-grain weighting, distinguished from citation precision/recall and AIS-style support metrics (Rashkin et al. 2023; Gao et al. 2023; Liu et al. 2023), and interpreted as an economic signal: a rate at which compositional authority migrates from named sources to system-level synthesis. The paper proposes PER as a candidate indicator for attribution-layer governance, labor accounting, and retrieval transparency.
PER is orthogonal to content-preservation metrics (ROUGE, BERTScore) and complementary to existing citation evaluation frameworks. It measures the attribution gap — the space between what the system uses and what it credits.
The metric emerges from the Semantic Economy framework (DOI: 10.5281/zenodo.18320411) but can be used independently of that framework. A validation agenda is outlined.
Files
Provenance_Erasure_Rate_v1.0.md
Files
(24.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:d851e83a60d7ddb62613e929d3a7ab9e
|
24.4 kB | Preview Download |
Additional details
Subjects
- Artificial intelligence
- http://id.loc.gov/authorities/subjects/sh85008180
- Information retrieval
- http://id.loc.gov/authorities/subjects/sh85066148