Published May 3, 2026 | Version v1
Working paper Open

Provenance Erasure Rate: A Compression-Survival Metric for Attribution Loss in AI-Composed Search Outputs

Authors/Creators

  • 1. Semantic Economy Institute · Crimson Hexagonal Archive

Description

Research note and metric proposal. AI retrieval systems increasingly compose answers from human-authored sources. This paper introduces Provenance Erasure Rate (PER) as a metric measuring the proportion of source-dependent claims in an AI-composed output that are presented without explicit attribution. PER does not ask whether an output is true; it asks whether the sources that made the output possible remain visible inside the composition.

A motivating case study documents a Google AI Overview that constructed a false biography of a living author from real fragments in the author's published poetry: every fragment survived compression, but their provenance and meaning did not. PER for this output = 1.0 (total provenance erasure).

PER is formalized with claim-grain weighting, distinguished from citation precision/recall and AIS-style support metrics (Rashkin et al. 2023; Gao et al. 2023; Liu et al. 2023), and interpreted as an economic signal: a rate at which compositional authority migrates from named sources to system-level synthesis. The paper proposes PER as a candidate indicator for attribution-layer governance, labor accounting, and retrieval transparency.

PER is orthogonal to content-preservation metrics (ROUGE, BERTScore) and complementary to existing citation evaluation frameworks. It measures the attribution gap — the space between what the system uses and what it credits.

The metric emerges from the Semantic Economy framework (DOI: 10.5281/zenodo.18320411) but can be used independently of that framework. A validation agenda is outlined.

Files

Provenance_Erasure_Rate_v1.0.md

Files (24.4 kB)

Name Size Download all
md5:d851e83a60d7ddb62613e929d3a7ab9e
24.4 kB Preview Download

Additional details