Sieving: Denoising-Robust Fine-Tuning for Semantic Structural Representation in Natural Language Inference

Šotek, Miroslav

doi:10.5281/zenodo.18985467

Published March 12, 2026 | Version v1

Technical note Open

Sieving: Denoising-Robust Fine-Tuning for Semantic Structural Representation in Natural Language Inference

Šotek, Miroslav

Natural Language Inference (NLI) models frequently suffer from "shortcut learning," over-relying on superficial lexical overlap rather than capturing deep semantic entailment. Drawing inspiration from diffusion-based language modelling and denoising autoencoders, we introduce Sieving, a dynamic token-level corruption strategy applied during fine-tuning. By stochastically injecting noise into the input sequence during training—specifically through a calibrated mixture of masking and random token replacement—Sieving forces the model to construct robust, globally aware semantic representations. Our method effectively filters (or "sieves") out surface-level heuristics, leading to superior generalisation on adversarial benchmarks (e.g., ANLI, PAWS) and noisy real-world text regimes. Within the Director Class AI architecture, Sieving serves as a critical stabilisation layer for the Coherence Engine.

Files

Sieving_ANULUM_Paper_2026.pdf

Files (280.0 kB)

Name	Size	Download all
Sieving_ANULUM_Paper_2026.pdf md5:2791948907bce5dbfffcf2efd7aba358	280.0 kB	Preview Download

Additional details

Is described by: Software: 10.5281/zenodo.18822167 (DOI)
Is documented by: Software: 10.5281/zenodo.18928898 (DOI)
Is supplement to: Software: https://github.com/anulum/director-ai/ (URL)

Repository URL: https://github.com/anulum/director-ai/
Development Status: Active

	All versions	This version
Views	27	27
Downloads	15	15
Data volume	4.8 MB	4.8 MB

Sieving: Denoising-Robust Fine-Tuning for Semantic Structural Representation in Natural Language Inference

Authors/Creators

Description

Files

Sieving_ANULUM_Paper_2026.pdf

Files (280.0 kB)

Additional details

Related works

Software