Published April 17, 2026
| Version v1
Preprint
Open
A Discrete Weight Language for Large Language Models: Compressing Gemma 4 31B with a 5-Step Ternary Route
Description
We present Route B, a weight-compression scheme that treats every weight tensor of a pretrained language model as a sequence of short codes drawn from a shared, per-family code book. In effect, each scalar weight is spelled out by a 5-step ternary path through a learned ladder of amplitudes. Applied to google/gemma-4-31B-it, the method replaces the 60 GB bf16 weights of the 410 linear layers by 7.9 bits per weight, reducing the working memory of inference to about 31 GB with no change to the model I/O interface...
Files
A_Discrete_Weight_Language_for_Large_Language_Models_Compressing_Gemma_4_31B_with_a_5_Step_Ternary_Route.pdf
Files
(562.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:fcf987f36643179edb510d232286a01b
|
328.9 kB | Preview Download |
|
md5:e78aa914293f479ab1e75f1bf494b489
|
233.5 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/AubakirovArman/discrete-weight-language-gemma4
- Programming language
- Python
- Development Status
- Active