Published March 12, 2026 | Version v1
Technical note Open

Sieving: Denoising-Robust Fine-Tuning for Semantic Structural Representation in Natural Language Inference

Authors/Creators

Description

Natural Language Inference (NLI) models frequently suffer from "shortcut learning," over-relying on superficial lexical overlap rather than capturing deep semantic entailment. Drawing inspiration from diffusion-based language modelling and denoising autoencoders, we introduce Sieving, a dynamic token-level corruption strategy applied during fine-tuning. By stochastically injecting noise into the input sequence during training—specifically through a calibrated mixture of masking and random token replacement—Sieving forces the model to construct robust, globally aware semantic representations. Our method effectively filters (or "sieves") out surface-level heuristics, leading to superior generalisation on adversarial benchmarks (e.g., ANLI, PAWS) and noisy real-world text regimes. Within the Director Class AI architecture, Sieving serves as a critical stabilisation layer for the Coherence Engine.

Files

Sieving_ANULUM_Paper_2026.pdf

Files (280.0 kB)

Name Size Download all
md5:2791948907bce5dbfffcf2efd7aba358
280.0 kB Preview Download

Additional details

Related works

Is described by
Software: 10.5281/zenodo.18822167 (DOI)
Is documented by
Software: 10.5281/zenodo.18928898 (DOI)
Is supplement to
Software: https://github.com/anulum/director-ai/ (URL)

Software

Repository URL
https://github.com/anulum/director-ai/
Development Status
Active