Model Minification: A Taylor-Ridge Framework for Structured Compression
Authors/Creators
Description
As Large Language Models (LLMs) scale, the deployment cost on commodity hardware becomes
prohibitive. While unstructured pruning offers theoretical compression, it often requires specialized
kernels to realize speedups. We propose a robust Structured Minification framework that physically
reduces the intermediate dimensions of Transformer MLPs, ensuring compatibility with standard GEMM
operations. Our methodology combines (1) a global Taylor-First-Order sensitivity analysis to identify
redundant feature dimensions, and (2) a closed-form Ridge Regression reconstruction to optimally
heal the output distribution of the pruned layers.
We investigate the efficacy of this approach across model scales, applying it to a parameter-dense
135M model and a 1.7B model. Our results demonstrate that minification is highly effective even for
smaller, dense models at high retention rates: the 135M model retains significant coherence at 90%
retention (Perplexity 4.33 → 4.89). Furthermore, we observe a strong scaling law: the 1.7B model exhibits
remarkable robustness, tolerating 30% structural removal with only minor degradation (Perplexity 3.16
→ 4.09). This suggests that while smaller models require conservative minification (80-90% retention),
larger over-parameterized models possess a highly compressible subspace recoverable via linear least-
squares.
Furthermore, because our framework reduces model topology without altering weight precision, it
remains strictly orthogonal to quantization, enabling composite compression pipelines that leverage both
structural minification and subsequent bit-width reduction.
The code is available at https://github.com/VladimerKhasia/minisp
Files
SP.pdf
Files
(264.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:e0a253137513b87024175cb33f0016b8
|
264.2 kB | Preview Download |