Patch-Token Geometry of Four Released Vision Encoders: A Multi-Dataset Artifact Study
Authors/Creators
Description
We report a fixed-pipeline measurement study of four released visual encoders from the Meta release ecosystem that differ simultaneously in objective family, backbone size, patch grid, embedding dimension, and image-versus-video pretraining: DINOV2 [13], I-JEPA [1], a still-image adapter for V-JEPA 2 [2], and EUPE [17]. Evidence is reported at two scales: a detailed 202-image CALTECH101 seed-17 run with image and parent-bucket block bootstraps, paired sign-flip tests, Benjamini–Hochberg FDR summaries, and sample-size sensitivity; and a six-run publication snapshot spanning CALTECH101, IMAGENETTE, OXFORDIIIT PETS, and DTD. In the detailed run, EUPE is the most compressed and spatially coherent encoder (mean effective rank 39.68 [39.39, 39.96], spatial coherence 0.921 [0.918, 0.923]), and the V-JEPA 2 still-image adapter has the largest local intrinsic dimensionality (11.63 [11.52, 11.73]). The same permodel extrema persist in the unweighted six-run summary, in which CALTECH101 contributes three seeds and the other three datasets contribute one each. Pairwise scores remain protocol diagnostics rather than invariant model-affinity claims: under our shared-prefix token-alignment protocol, the largest mean linear CKA is for DINOV2 vs. V-JEPA 2 (0.448) and the smallest is for I-JEPA vs. EUPE (0.121). The supported claim is narrow: under recorded runtime backends, per-model geometry extremes persist across the checked-in dataset snapshot, while causal objective-family claims, backend-invariant pairwise claims, and token-contract-invariant claims remain unsupported.
Files
Patch-Token Geometry of Four Released Vision Encoders - A Multi-Dataset Artifact Study.pdf
Files
(435.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:879a187dec5dce63bf022e75aea4e2fd
|
435.3 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/AbdelStark/latent-inspector-py
- Programming language
- Python
- Development Status
- Active