From Superposition to Decision: Semantic Wave Functions for Uncertainty-Aware NLP

Vinokurova, Elena

doi:10.5281/zenodo.19866551

Published April 28, 2026 | Version v1

Journal article Open

From Superposition to Decision: Semantic Wave Functions for Uncertainty-Aware NLP

Vinokurova, Elena (Researcher)

Modern language models generate a single deterministic representation for each input, without distinguishing between situations where the model is confident in its answer and those where it is guessing. We propose an architecture that explicitly embeds an uncertainty mechanism into the forward pass of the neural network: the model outputs an answer (collapse mode) or reports uncertainty (wave mode), and the boundary between the modes is determined by a trainable threshold.

The architecture is inspired by quantum mechanics as a design heuristic. The key component, the cross-attention observer, contextually constructs a representation of a premise when solving a Natural Language Inference (NLI) task, such that the same premise receives different representations for different hypotheses. With a fully frozen BERT-base and only 885,000 trainable parameters (0.8% of the model), the architecture outperforms the frozen baseline by 4.7 percentage points on the MultiNLI corpus (392,000 examples). For comparison: LoRA $$r=12$$ (2.0M parameters) achieves 84.5% on the same data, which is significantly higher in terms of accuracy, but without an uncertainty mechanism. The main advantage of the architecture lies not in competing with PEFT methods in terms of accuracy, but in an additional mechanism: the system explicitly distinguishes between «certain» and «uncertain», and the class-based collapse pattern identifies annotation artifacts in the data (confirmed by an adversarial NLI test).

Systematic verification showed that quantum mathematics (complex amplitudes, the Schrödinger equation, interference) does not provide an advantage over classical analogues in the current configuration, and it is the architectural mechanism that works, not the physical formalism. This result delineates the limits of the applicability of quantum analogies in machine learning and opens up a direction for further research: under what conditions can quantum mathematics provide a computational advantage on classical hardware.

The independent contribution consists of a unified parametric family of criteria for pruning of neural networks s_j = |W_j|^p * |g_j|^q, where $W$ denotes the weight, $g$ denotes the gradient, and the parameters $p$ and $q$ define a specific criterion. It is shown that widely used methods (Magnitude, SNIP) are special cases of this family, and when the gradient is averaged over several batches, the criteria begin to diverge significantly (up to 9.6 percentage points).

All key experiments were conducted with 3-5 random initializations and paired statistical tests on full datasets.

Files

wpdmt_paper_en_last_v.pdf

Files (2.3 MB)

Name	Size	Download all
wpdmt_paper_en_last_v.pdf md5:75d396188fba5736ead68610e89896e9	2.3 MB	Preview Download

Additional details

Subtitle (English): Pilot Study

Yoshua Bengio, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013. URL https://arxiv.org/abs/1308.3432
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877-1901, 2020. URL https://arxiv.org/abs/2005.14165
Eugen Bărbulescu, Asher Trockman, and J. Zico Kolter. Hyperflux: A conceptually-grounded pruning method and the neural pruning law hypothesis. arXiv preprint arXiv:2504.05349, 2025. URL https://arxiv.org/abs/2504.05349
Samuel Yen-Chi Chen, Chao-Han Huck Yang, et al. Quantum-PEFT: Ultra parameter-efficient fine-tuning. In International Conference on Learning Representations (ICLR), 2025. URL https://openreview.net/forum?id=dgR6i4TSng
Ido Dagan, Oren Glickman, and Bernardo Magnini. The PASCAL recognising textual entailment challenge. In Machine Learning Challenges Workshop, pages 177-190. Springer, 2005. URL https://link.springer.com/chapter/10.1007/11736790_9
Rocktim Jyoti Das, Mingjie Sun, Liqun Ma, and Zhiqiang Shen. Beyond size: How gradients shape pruning decisions in large language models. arXiv preprint arXiv:2311.04902, 2023. URL https://arxiv.org/abs/2311.04902
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171-4186, 2019. URL https://arxiv.org/abs/1810.04805
Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of ICML, pages 1050-1059, 2016. URL https://arxiv.org/abs/1506.02142
Jiapeng Gao and Lukáš Galambos. Text classification with born's rule. In Advances in Neural Information Processing Systems, volume 35, 2022. URL https://openreview.net/forum?id=sNcn-E3uPHA
Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. In Proceedings of ICML, pages 1321-1330, 2017. URL https://arxiv.org/abs/1706.04599
Aditya Gupta et al. QLens: Towards a quantum perspective of language transformers. arXiv preprint arXiv:2510.11963, 2025. URL https://arxiv.org/abs/2510.11963
Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A Smith. Annotation artifacts in natural language inference data. In Proceedings of NAACL-HLT, pages 107-112, 2018. URL https://arxiv.org/abs/1803.02324
Babak Hassibi and David G Stork. Optimal brain surgeon and general network pruning. In IEEE International Conference on Neural Networks, pages 293-299, 1993. URL https://proceedings.neurips.cc/paper/1992/hash/303ed4c69846ab36c2904d3ba8573050-Abstract.html
Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. DeBERTa: Decoding-enhanced BERT with disentangled attention. arXiv preprint arXiv:2006.03654, 2021. URL https://arxiv.org/abs/2006.03654
Radu Herbei and Marten H. Wegkamp. Classification with reject option. Canadian Journal of Statistics, 34(4):709-721, 2006. URL https://doi.org/10.1002/cjs.5550340410
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. In Proceedings of ICML, pages 2790-2799, 2019. URL https://arxiv.org/abs/1902.00751
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In Proceedings of ICLR, 2022. URL https://arxiv.org/abs/2106.09685
Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with Gumbel-Softmax. In ICLR, 2017. URL https://arxiv.org/abs/1611.01144
Walter Kohn and Lu Jeu Sham. Self-consistent equations including exchange and correlation effects. Physical Review, 140(4A):A1133-A1138, 1965. URL https://doi.org/10.1103/PhysRev.140.A1133
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems, 30, 2017. URL https://arxiv.org/abs/1612.01474
Yann LeCun, John Denker, and Sara Solla. Optimal brain damage. In Advances in Neural Information Processing Systems, volume 2, 1990. URL https://proceedings.neurips.cc/paper/1989/hash/6c9882bbac1c7093bd25041881277658-Abstract.html
Namhoon Lee, Thalaiyasingam Ajanthan, and Philip HS Torr. Single-shot network pruning based on connection sensitivity. SNIP: In Proceedings of ICLR, 2019. URL https://arxiv.org/abs/1810.02340
Guangxi Li, Xuanqiang Zhao, and Xin Wang. Quantum self-attention neural networks for text classification. Science China Information Sciences, 67:142501, 2024. URL https://doi.org/10.1007/s11432-023-3879-7
Xin Li and Dan Roth. Learning question classifiers. In Proceedings of COLING, pages 1-7, 2002. URL https://aclanthology.org/C02-1150/
Weijie Liu, Peng Zhou, Zhiruo Wang, Zhe Zhao, Haotang Deng, and Qi Ju. FastBERT: A self-distilling BERT with adaptive inference time. In ACL, 2020. URL https://arxiv.org/abs/2004.02178
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019. URL https://arxiv.org/abs/1907.11692
Milan Maksimovic and Ivan S. Maksymov. Quantum-cognitive neural networks: Assessing confidence and uncertainty with human decision- making simulations. Big Data and Cognitive Computing, 9(1):12, 2025. URL https://doi.org/10.3390/bdcc9010012
James Martens and Roger Grosse. Optimizing neural networks with Kronecker-factored approximate curvature. In Proceedings of ICML, pages 2408-2417, 2015. URL https://arxiv.org/abs/1503.05671
Konstantinos Meichanetzidis, Alexis Toumi, Giovanni de Felice, and Bob Coecke. Grammar-aware sentence classification on quantum computers. Quantum Machine Intelligence, 5:10, 2023. URL https://doi.org/10.1007/s42484-023-00097-1
Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, and Jan Kautz. Importance estimation for neural network pruning. In Proceedings of CVPR, pages 11264-11272, 2019. URL https://arxiv.org/abs/1611.06440
Matteo Pagliardini, Pierre Ablin, and Martin Jaggi. AdEMAMix: Better momentum schedules through exponential moving average mixing. arXiv preprint arXiv:2409.03137, 2024. URL https://arxiv.org/abs/2409.03137
Mohammad Taher Pilehvar and Jose Camacho-Collados. WiC: The word-in-context dataset for evaluating context-sensitive meaning representations. In Proceedings of NAACL-HLT, pages 1267-1273, 2019. URL https://arxiv.org/abs/1808.09121
Lutz Prechelt. Early stopping-but when? In Neural Networks: Tricks of the Trade, pages 55-69. Springer, 1998. URL https://doi.org/10.1007/3-540-49430-8_3
Elvis Saravia, Hsien-Chi Toby Liu, Yen-Hao Huang, Junlin Wu, and Yi- Shin Chen. CARER: Contextualized affect representations for emotion recognition. In Proceedings of EMNLP, pages 3687-3697, 2018. URL https://aclanthology.org/D18-1404/
Murat Sensoy, Lance Kaplan, and Melih Kandemir. Evidential deep learning to quantify classification uncertainty. In NeurIPS, 2018. URL https://arxiv.org/abs/1806.01768
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of EMNLP, pages 1631-1642, 2013. URL https://aclanthology.org/D13-1170/
Hidenori Tanaka, Daniel Kunin, Daniel LK Yamins, and Surya Ganguli. Pruning neural networks without any data by iteratively conserving synaptic flow. In Advances in Neural Information Processing Systems, volume 33, pages 6377-6389, 2020. URL https://arxiv.org/abs/2006.05467
Charles M. Varmantchaonala, Jean Louis K. E. Fendji, Julius Schöning, and Marcellin Atemkeng. Quantum natural language processing: A comprehensive survey. IEEE Access, 12:99578-99598, 2024. URL https://doi.org/10.1109/ACCESS.2024.3420707
John Archibald Wheeler. The "past'' and the "delayed-choice'' double- slit experiment. In Mathematical Foundations of Quantum Theory, pages 9-48. Academic Press, 1978. URL https://doi.org/10.1016/ B978-0-12-473250-6.50006-6
Dominic Widdows, Willie Aboumrad, Dohun Kim, Sayonee Ray, and Jonathan Mei. Quantum natural language processing. Künstliche Intelligenz, 38:293-310, 2024. URL https://arxiv.org/abs/2403.19758
Adina Williams, Nikita Nangia, and Samuel Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of NAACL-HLT, pages 1112-1122, 2018. URL https://arxiv.org/abs/1704.05426
Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. DeeBERT: Dynamic early exiting for accelerating BERT inference. In ACL, 2020. URL https://arxiv.org/abs/2004.12993
Zhewei Yao, Amir Gholami, Sheng Shen, Mustafa Mustafa, Kurt Keutzer, and Michael W Mahoney. AdaHessian: An adaptive second order optimizer for machine learning. In Proceedings of AAAI, volume 35, pages 10665-10673, 2021. URL https://arxiv.org/abs/2006.00719
Peng Zhang, Jiabin Niu, Zhan Su, Benyou Wang, Liqun Ma, and Dawei Song. Quantum-inspired interactive networks for conversational sentiment analysis. In Proceedings of NAACL-HLT, pages 2220-2230, 2019. URL https://oro.open.ac.uk/61755/
Wei Zhang et al. A survey of quantum transformers. arXiv preprint arXiv:2504.03192, 2025. URL https://arxiv.org/abs/2504.03192
Wojciech Hubert Zurek. Decoherence, einselection, and the quantum origins of the classical. Reviews of Modern Physics, 75(3):715-775, 2003. URL https://doi.org/10.1103/RevModPhys.75.715

	All versions	This version
Views	118	118
Downloads	98	98
Data volume	300.8 MB	300.8 MB

wpdmt_paper_en_last_v.pdf

Files (2.3 MB)

Additional titles

References

From Superposition to Decision: Semantic Wave Functions for Uncertainty-Aware NLP

Authors/Creators

Description

Files

wpdmt_paper_en_last_v.pdf

Files (2.3 MB)

Additional details

Additional titles

References