Published April 17, 2026 | Version v1
Preprint Open

A Discrete Weight Language for Large Language Models: Compressing Gemma 4 31B with a 5-Step Ternary Route

  • 1. Independent Research

Description

We present Route B, a weight-compression scheme that treats every weight tensor of a pretrained language model as a sequence of short codes drawn from a shared, per-family code book. In effect, each scalar weight is spelled out by a 5-step ternary path through a learned ladder of amplitudes. Applied to google/gemma-4-31B-it, the method replaces the 60 GB bf16 weights of the 410 linear layers by 7.9 bits per weight, reducing the working memory of inference to about 31 GB with no change to the model I/O interface...

Files

A_Discrete_Weight_Language_for_Large_Language_Models_Compressing_Gemma_4_31B_with_a_5_Step_Ternary_Route.pdf

Additional details

Software

Repository URL
https://github.com/AubakirovArman/discrete-weight-language-gemma4
Programming language
Python
Development Status
Active