IWBS: Influence-Weighted Bagged Splines for Robust Regression in Small-Data Regimes

Voronsbekher, Valentyn

doi:10.5281/zenodo.18475344

Published January 16, 2026 | Version 1.0

Preprint Open

IWBS: Influence-Weighted Bagged Splines for Robust Regression in Small-Data Regimes

Voronsbekher, Valentyn (Researcher)¹

1. University of Bayreuth

High-capacity machine learning models, such as Gradient Boosting Machines
(GBM) and deep Random Forests, often succumb to the bias-variance trade-off
when training data is scarce (N < 500). While they offer low bias, they frequently
overfit noise or fail to approximate smooth functions due to discrete partitioning.
Conversely, classical linear models (OLS) offer stability but lack the capacity to
model complex dynamics. This paper introduces Influence-Weighted Bagged
Splines (IWBS), a specialized ensemble architecture designed for high-complexity,
small-data regimes. IWBS combines the flexibility of randomized additive splines
with a novel Out-of-Bag (OOB) Stability Weighting mechanism. Unlike stan-
dard bagging, which averages learners uniformly, IWBS penalizes ensemble mem-
bers that exhibit high prediction instability on held-out data. We benchmark IWBS
against fully tuned Tree Ensembles (GBM, Random Forest) and specialized small-
data solvers (Gaussian Processes, GAMs, Kernel Ridge Regression) across physical,
economic, and biological domains. Results demonstrate that IWBS achieves state-
of-the-art performance in signal-rich tasks (Concrete, Moneyball), outperforming
both tree-based methods and kernel smoothers by capturing high-frequency non-
linearities without overfitting. Furthermore, we establish the method’s boundary
conditions, showing that in high-noise regimes (Diabetes), global smoothers like
Kernel Ridge Regression remain superior to structure-discovery approaches.

Files

IWBS.pdf

Files (342.9 kB)

Name	Size	Download all
IWBS.pdf md5:828fa5e6c226790a74488d60fa64b4d2	342.9 kB	Preview Download

Additional details

Created: 2026-01-16

Repository URL: https://github.com/1zzuk1/IWBS
Programming language: R
Development Status: Active

Leo Breiman. Bagging predictors. Machine learning, 24(2):123–140, 1996.
Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. Least angle regression. The Annals of statistics, 32(2):407–499, 2004. Original source for the Diabetes dataset.
Jerome H Friedman. Multivariate adaptive regression splines. The annals of statis- tics, pages 1–67, 1991.
Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
Arthur E Hoerl and Robert W Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970.
Michael Lewis. Moneyball: The art of winning an unfair game. WW Norton & Company, 2004. Context for the MLB Salary dataset.
Nicolai Meinshausen and Peter B¨uhlmann. Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4):417–473, 2010.
Mark J Van der Laan, Eric C Polley, and Alan E Hubbard. Super learner. Statistical applications in genetics and molecular biology, 6(1), 2007.
I-Cheng Yeh. Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete research, 28(12):1797–1808, 1998.

	All versions	This version
Views	183	183
Downloads	166	166
Data volume	91.2 MB	91.2 MB

IWBS.pdf

Files (342.9 kB)

Dates

Software

References

IWBS: Influence-Weighted Bagged Splines for Robust Regression in Small-Data Regimes

Authors/Creators

Description

Files

IWBS.pdf

Files (342.9 kB)

Additional details

Dates

Software

References