Published February 12, 2026
| Version v1
Preprint
Open
Non-Bijunctive Permutation Collapse: AltiVec vec_perm Enables Single-Cycle Attention Path Selection for LLM Inference
Description
We demonstrate that IBM AltiVec's vec_perm instruction performs non-bijective permutations — operations architecturally impossible on modern x86 (AVX-512) and ARM (NEON) SIMD units, which enforce bijective shuffles.
Key results:
- 27–96x operation advantage over x86/ARM for combined prune+amplify attention operations
- 8.81x inference speedup (16.74 to 147.54 t/s) on IBM POWER8 S824 with TinyLlama 1.1B
- 30+ permutation patterns benchmarked: hierarchical collapse, multi-head attention, sparse pruning, fractal transforms
- Hebbian learning connection: hardware-native winner-take-all path selection
Integrated into llama.cpp. POWER8's 128 vector registers and SMT8 (128 threads) make it uniquely capable for non-bijunctive AI inference.
Notes
Files
VecPerm_NonBijunctive_Collapse.md
Files
(17.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:4de9bd654a7b080f3f30b93bab7e6952
|
17.1 kB | Preview Download |
Additional details
Related works
- Is supplemented by
- Software: https://github.com/Scottcjn/ram-coffers (URL)
- References
- Publication: 10.5281/zenodo.18321905 (DOI)
- Publication: 10.5281/zenodo.18623592 (DOI)