Which Leakage Types Matter?

Roth, Simon

doi:10.5281/zenodo.19406148

There is a newer version of the record available.

Published April 3, 2026 | Version 1.0

Preprint Open

Which Leakage Types Matter?

Roth, Simon

Twenty-eight within-subject counterfactual experiments across 2,047 tabular datasets, plus a boundary experiment on 129 temporal datasets, measuring the severity of four data leakage classes in machine learning. Class I (estimation — fitting scalers on full data) is negligible: all nine conditions produce |ΔAUC| ≤ 0.005. Class II (selection — peeking, seed cherry-picking) is substantial: ~90% of the measured effect is noise exploitation that inflates reported scores. Class III (memorization) scales with model capacity: d_z = 0.37 (Naive Bayes) to 1.11 (Decision Tree). Class IV (boundary) is invisible under random CV. The textbook emphasis is inverted: normalization leakage matters least; selection leakage at practical dataset sizes matters most.

Files

roth2026_landscape_leakage_types_v1.pdf

Files (446.6 kB)

Name	Size	Download all
roth2026_landscape_leakage_types_v1.pdf md5:b32a7ac8aeeff7592f2e19f5f7a5e1b8	446.6 kB	Preview Download

Additional details

Subtitle (English): A Quantitative Landscape Across 2,047 Benchmark Datasets

Is supplement to: Preprint: arXiv:2603.10742 (arXiv); Preprint: 10.5281/zenodo.19406355 (DOI)
Is supplemented by: Software: https://github.com/epagogy/ml (URL)

618

Views

600

Downloads

Show more details

	All versions	This version
Views	618	564
Downloads	600	555
Data volume	598.5 MB	573.4 MB

More info on how stats are collected....

DOI

Resource type

Preprint

Publisher

EPAGOGY

Conference

Conference website

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: April 3, 2026
Modified: May 29, 2026

roth2026_landscape_leakage_types_v1.pdf

Files (446.6 kB)

Additional titles

Related works

Which Leakage Types Matter?

Authors/Creators

Description

Files

roth2026_landscape_leakage_types_v1.pdf

Files (446.6 kB)

Additional details

Additional titles

Related works