Ixnos: A Deterministic Fragment Identifier Protocol for AI Training Data
Authors/Creators
Description
Report series: Ixnos Research Reports
Report number: IRR-002
This paper introduces Ixnos, a protocol for assigning deterministic, content-addressable identifiers to sub-document fragments of machine learning training data. Current training data documentation practices operate at the corpus or dataset level, making it impossible to detect fine-grained overlap between training sets and evaluation benchmarks, reproduce exact dataset compositions, or support audit workflows related to data governance regulations such as the EU AI Act.
Ixnos addresses these limitations by defining a fragment identification primitive — the Ixnos Fragment Identifier (IFI) — based on canonicalized content hashing with explicit segmentation profiles. Dataset Recipes and Provenance Manifests are defined as composable protocol layers enabling deterministic corpus fingerprinting, overlap detection, and reproducible dataset composition.
A minimal overlap-detection experiment and a larger-scale feasibility test demonstrate that IFI-based indexing correctly detects contamination at sub-document granularity and performs constant-time lookup on indexed corpora. Ixnos is proposed as a narrow infrastructure primitive for training data traceability rather than as a causal attribution or semantic similarity system.
Files
IRR-002_Ixnos_Protocol.pdf
Files
(300.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:086c541490462b08c8e8f3d2e9bf859b
|
300.5 kB | Preview Download |
Additional details
Dates
- Submitted
-
2026
Software
- Repository URL
- https://github.com/Andr0meda/ixnos-research
- Programming language
- Python
- Development Status
- Active