Published December 10, 2025
| Version v1
Preprint
Open
Balanced Ternary Transformers: Eliminating Multiplication and Enabling Epistemic Uncertainty
Description
I present a quantisation method for transformer-based language models that constrains weights to balanced ternary values {-1, 0, +1}, eliminating floating-point matrix multiplication entirely. Derived from Brusentsov's balanced ternary research at Moscow State University (1958-1965), this approach replaces multiply-accumulate operations with addition, subtraction, and skip operations.
Key results:
- 93.8% reduction in energy consumption per inference
- 16x memory compression (28GB → 1.75GB for 7B parameters)
- 48x theoretical throughput improvement
- 87-92% signal preservation
- Architectural epistemic uncertainty enabling 50% abstention on uncertain inputs (hallucination prevention)
The method requires no specialised hardware. Standard CPUs can execute efficiently.
Full implementation open-sourced at: https://github.com/Zaneham/Ternary_inference
Files
Balanced_Ternary_Transformers_ZaneH.pdf
Files
(226.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:72880b829fb6ec84336b37fe4b065820
|
226.1 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/Zaneham/Ternary_inference
- Development Status
- Active