There is a newer version of the record available.

Published January 16, 2026 | Version v1
Publication Open

Model Minification: A Taylor-Ridge Framework for Structured Compression

Authors/Creators

Description

As Large Language Models (LLMs) scale, the deployment cost on commodity hardware becomes
prohibitive. While unstructured pruning offers theoretical compression, it often requires specialized
kernels to realize speedups. We propose a robust Structured Minification framework that physically
reduces the intermediate dimensions of Transformer MLPs, ensuring compatibility with standard GEMM
operations. Our methodology combines (1) a global Taylor-First-Order sensitivity analysis to identify
redundant feature dimensions, and (2) a closed-form Ridge Regression reconstruction to optimally
heal the output distribution of the pruned layers.
We investigate the efficacy of this approach across model scales, applying it to a parameter-dense
135M model and a 1.7B model. Our results demonstrate that minification is highly effective even for
smaller, dense models at high retention rates: the 135M model retains significant coherence at 90%
retention (Perplexity 4.33 → 4.89). Furthermore, we observe a strong scaling law: the 1.7B model exhibits
remarkable robustness, tolerating 30% structural removal with only minor degradation (Perplexity 3.16
→ 4.09). This suggests that while smaller models require conservative minification (80-90% retention),
larger over-parameterized models possess a highly compressible subspace recoverable via linear least-
squares.
Furthermore, because our framework reduces model topology without altering weight precision, it
remains strictly orthogonal to quantization, enabling composite compression pipelines that leverage both
structural minification and subsequent bit-width reduction.
The code is available at https://github.com/VladimerKhasia/minisp

Files

SP.pdf

Files (264.2 kB)

Name Size Download all
md5:e0a253137513b87024175cb33f0016b8
264.2 kB Preview Download