A Permutation-Null Framework for Conservative Feature Selection in Redundant Learning Problems
Authors/Creators
Description
Feature selection is widely employed to improve model generalisation and interpretability in high-dimensional learning problems. However, many existing approaches rely on deterministic feature importance rankings that lack statistical interpretation and frequently lead to over-pruning, particularly in engineering and scientific datasets characterised by feature correlation, redundancy, and limited sample sizes.
This paper proposes a permutation-null feature selection framework that evaluates feature relevance by comparing observed model-derived importance values against empirical null distributions obtained under feature–label decoupling. Feature selection is formulated as a hypothesis testing problem, retaining predictors whose importance exceeds noise-level expectations. To prevent excessive dimensionality reduction, the framework incorporates an adaptive threshold selection procedure and a conservative prune-ratio constraint.
Experimental evaluation across datasets with varying redundancy characteristics demonstrates that the proposed approach preserves predictive performance on well-conditioned datasets while improving stability and generalisation in the presence of distractor features. Results indicate that feature selection does not universally improve accuracy; however, statistically grounded noise-rejection strategies offer measurable benefits in redundancy-dominated regimes.
Files
A Permutation-Null Framework for Conservative Feature Selection in Redundant Learning Problems.pdf
Files
(376.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:6227eec0731d76421a7e2e583b62f231
|
376.5 kB | Preview Download |