From Superposition to Decision: Semantic Wave Functions for Uncertainty-Aware NLP
Authors/Creators
Description
Modern language models generate a single deterministic representation for each input, without distinguishing between situations where the model is confident in its answer and those where it is guessing. We propose an architecture that explicitly embeds an uncertainty mechanism into the forward pass of the neural network: the model outputs an answer (collapse mode) or reports uncertainty (wave mode), and the boundary between the modes is determined by a trainable threshold.
The architecture is inspired by quantum mechanics as a design heuristic. The key component, the cross-attention observer, contextually constructs a representation of a premise when solving a Natural Language Inference (NLI) task, such that the same premise receives different representations for different hypotheses. With a fully frozen BERT-base and only 885,000 trainable parameters (0.8% of the model), the architecture outperforms the frozen baseline by 4.7 percentage points on the MultiNLI corpus (392,000 examples). For comparison: LoRA $r=12$ (2.0M parameters) achieves 84.5% on the same data, which is significantly higher in terms of accuracy, but without an uncertainty mechanism. The main advantage of the architecture lies not in competing with PEFT methods in terms of accuracy, but in an additional mechanism: the system explicitly distinguishes between «certain» and «uncertain», and the class-based collapse pattern identifies annotation artifacts in the data (confirmed by an adversarial NLI test).
Systematic verification showed that quantum mathematics (complex amplitudes, the Schrödinger equation, interference) does not provide an advantage over classical analogues in the current configuration, and it is the architectural mechanism that works, not the physical formalism. This result delineates the limits of the applicability of quantum analogies in machine learning and opens up a direction for further research: under what conditions can quantum mathematics provide a computational advantage on classical hardware.
The independent contribution consists of a unified parametric family of criteria for pruning of neural networks s_j = |W_j|^p * |g_j|^q, where W denotes the weight, g denotes the gradient, and the parameters p and q define a specific criterion. It is shown that widely used methods (Magnitude, SNIP) are special cases of this family, and when the gradient is averaged over several batches, the criteria begin to diverge significantly (up to 9.6 percentage points).
All key experiments were conducted with 3-5 random initializations and paired statistical tests on full datasets.
Files
wpdmt_paper_en_last_v.pdf
Files
(2.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:75d396188fba5736ead68610e89896e9
|
2.3 MB | Preview Download |
Additional details
Additional titles
- Subtitle (English)
- Pilot Study
References
- Yoshua Bengio, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013. URL https://arxiv.org/abs/1308.3432
- Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877-1901, 2020. URL https://arxiv.org/abs/2005.14165
- Eugen Bărbulescu, Asher Trockman, and J. Zico Kolter. Hyperflux: A conceptually-grounded pruning method and the neural pruning law hypothesis. arXiv preprint arXiv:2504.05349, 2025. URL https://arxiv.org/abs/2504.05349
- Samuel Yen-Chi Chen, Chao-Han Huck Yang, et al. Quantum-PEFT: Ultra parameter-efficient fine-tuning. In International Conference on Learning Representations (ICLR), 2025. URL https://openreview.net/forum?id=dgR6i4TSng
- Ido Dagan, Oren Glickman, and Bernardo Magnini. The PASCAL recognising textual entailment challenge. In Machine Learning Challenges Workshop, pages 177-190. Springer, 2005. URL https://link.springer.com/chapter/10.1007/11736790_9
- Rocktim Jyoti Das, Mingjie Sun, Liqun Ma, and Zhiqiang Shen. Beyond size: How gradients shape pruning decisions in large language models. arXiv preprint arXiv:2311.04902, 2023. URL https://arxiv.org/abs/2311.04902
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171-4186, 2019. URL https://arxiv.org/abs/1810.04805
- Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of ICML, pages 1050-1059, 2016. URL https://arxiv.org/abs/1506.02142
- Jiapeng Gao and Lukáš Galambos. Text classification with born's rule. In Advances in Neural Information Processing Systems, volume 35, 2022. URL https://openreview.net/forum?id=sNcn-E3uPHA
- Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. In Proceedings of ICML, pages 1321-1330, 2017. URL https://arxiv.org/abs/1706.04599
- Aditya Gupta et al. QLens: Towards a quantum perspective of language transformers. arXiv preprint arXiv:2510.11963, 2025. URL https://arxiv.org/abs/2510.11963
- Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A Smith. Annotation artifacts in natural language inference data. In Proceedings of NAACL-HLT, pages 107-112, 2018. URL https://arxiv.org/abs/1803.02324
- Babak Hassibi and David G Stork. Optimal brain surgeon and general network pruning. In IEEE International Conference on Neural Networks, pages 293-299, 1993. URL https://proceedings.neurips.cc/paper/1992/hash/303ed4c69846ab36c2904d3ba8573050-Abstract.html
- Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. DeBERTa: Decoding-enhanced BERT with disentangled attention. arXiv preprint arXiv:2006.03654, 2021. URL https://arxiv.org/abs/2006.03654
- Radu Herbei and Marten H. Wegkamp. Classification with reject option. Canadian Journal of Statistics, 34(4):709-721, 2006. URL https://doi.org/10.1002/cjs.5550340410
- Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. In Proceedings of ICML, pages 2790-2799, 2019. URL https://arxiv.org/abs/1902.00751
- Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In Proceedings of ICLR, 2022. URL https://arxiv.org/abs/2106.09685
- Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with Gumbel-Softmax. In ICLR, 2017. URL https://arxiv.org/abs/1611.01144
- Walter Kohn and Lu Jeu Sham. Self-consistent equations including exchange and correlation effects. Physical Review, 140(4A):A1133-A1138, 1965. URL https://doi.org/10.1103/PhysRev.140.A1133
- Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems, 30, 2017. URL https://arxiv.org/abs/1612.01474
- Yann LeCun, John Denker, and Sara Solla. Optimal brain damage. In Advances in Neural Information Processing Systems, volume 2, 1990. URL https://proceedings.neurips.cc/paper/1989/hash/6c9882bbac1c7093bd25041881277658-Abstract.html
- Namhoon Lee, Thalaiyasingam Ajanthan, and Philip HS Torr. Single-shot network pruning based on connection sensitivity. SNIP: In Proceedings of ICLR, 2019. URL https://arxiv.org/abs/1810.02340
- Guangxi Li, Xuanqiang Zhao, and Xin Wang. Quantum self-attention neural networks for text classification. Science China Information Sciences, 67:142501, 2024. URL https://doi.org/10.1007/s11432-023-3879-7
- Xin Li and Dan Roth. Learning question classifiers. In Proceedings of COLING, pages 1-7, 2002. URL https://aclanthology.org/C02-1150/
- Weijie Liu, Peng Zhou, Zhiruo Wang, Zhe Zhao, Haotang Deng, and Qi Ju. FastBERT: A self-distilling BERT with adaptive inference time. In ACL, 2020. URL https://arxiv.org/abs/2004.02178
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019. URL https://arxiv.org/abs/1907.11692
- Milan Maksimovic and Ivan S. Maksymov. Quantum-cognitive neural networks: Assessing confidence and uncertainty with human decision- making simulations. Big Data and Cognitive Computing, 9(1):12, 2025. URL https://doi.org/10.3390/bdcc9010012
- James Martens and Roger Grosse. Optimizing neural networks with Kronecker-factored approximate curvature. In Proceedings of ICML, pages 2408-2417, 2015. URL https://arxiv.org/abs/1503.05671
- Konstantinos Meichanetzidis, Alexis Toumi, Giovanni de Felice, and Bob Coecke. Grammar-aware sentence classification on quantum computers. Quantum Machine Intelligence, 5:10, 2023. URL https://doi.org/10.1007/s42484-023-00097-1
- Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, and Jan Kautz. Importance estimation for neural network pruning. In Proceedings of CVPR, pages 11264-11272, 2019. URL https://arxiv.org/abs/1611.06440
- Matteo Pagliardini, Pierre Ablin, and Martin Jaggi. AdEMAMix: Better momentum schedules through exponential moving average mixing. arXiv preprint arXiv:2409.03137, 2024. URL https://arxiv.org/abs/2409.03137
- Mohammad Taher Pilehvar and Jose Camacho-Collados. WiC: The word-in-context dataset for evaluating context-sensitive meaning representations. In Proceedings of NAACL-HLT, pages 1267-1273, 2019. URL https://arxiv.org/abs/1808.09121
- Lutz Prechelt. Early stopping-but when? In Neural Networks: Tricks of the Trade, pages 55-69. Springer, 1998. URL https://doi.org/10.1007/3-540-49430-8_3
- Elvis Saravia, Hsien-Chi Toby Liu, Yen-Hao Huang, Junlin Wu, and Yi- Shin Chen. CARER: Contextualized affect representations for emotion recognition. In Proceedings of EMNLP, pages 3687-3697, 2018. URL https://aclanthology.org/D18-1404/
- Murat Sensoy, Lance Kaplan, and Melih Kandemir. Evidential deep learning to quantify classification uncertainty. In NeurIPS, 2018. URL https://arxiv.org/abs/1806.01768
- Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of EMNLP, pages 1631-1642, 2013. URL https://aclanthology.org/D13-1170/
- Hidenori Tanaka, Daniel Kunin, Daniel LK Yamins, and Surya Ganguli. Pruning neural networks without any data by iteratively conserving synaptic flow. In Advances in Neural Information Processing Systems, volume 33, pages 6377-6389, 2020. URL https://arxiv.org/abs/2006.05467
- Charles M. Varmantchaonala, Jean Louis K. E. Fendji, Julius Schöning, and Marcellin Atemkeng. Quantum natural language processing: A comprehensive survey. IEEE Access, 12:99578-99598, 2024. URL https://doi.org/10.1109/ACCESS.2024.3420707
- John Archibald Wheeler. The "past'' and the "delayed-choice'' double- slit experiment. In Mathematical Foundations of Quantum Theory, pages 9-48. Academic Press, 1978. URL https://doi.org/10.1016/ B978-0-12-473250-6.50006-6
- Dominic Widdows, Willie Aboumrad, Dohun Kim, Sayonee Ray, and Jonathan Mei. Quantum natural language processing. Künstliche Intelligenz, 38:293-310, 2024. URL https://arxiv.org/abs/2403.19758
- Adina Williams, Nikita Nangia, and Samuel Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of NAACL-HLT, pages 1112-1122, 2018. URL https://arxiv.org/abs/1704.05426
- Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. DeeBERT: Dynamic early exiting for accelerating BERT inference. In ACL, 2020. URL https://arxiv.org/abs/2004.12993
- Zhewei Yao, Amir Gholami, Sheng Shen, Mustafa Mustafa, Kurt Keutzer, and Michael W Mahoney. AdaHessian: An adaptive second order optimizer for machine learning. In Proceedings of AAAI, volume 35, pages 10665-10673, 2021. URL https://arxiv.org/abs/2006.00719
- Peng Zhang, Jiabin Niu, Zhan Su, Benyou Wang, Liqun Ma, and Dawei Song. Quantum-inspired interactive networks for conversational sentiment analysis. In Proceedings of NAACL-HLT, pages 2220-2230, 2019. URL https://oro.open.ac.uk/61755/
- Wei Zhang et al. A survey of quantum transformers. arXiv preprint arXiv:2504.03192, 2025. URL https://arxiv.org/abs/2504.03192
- Wojciech Hubert Zurek. Decoherence, einselection, and the quantum origins of the classical. Reviews of Modern Physics, 75(3):715-775, 2003. URL https://doi.org/10.1103/RevModPhys.75.715