Uncertainty-infused Representation Learning Using Neutrosophic-based Transformer Network
Creators
Description
Representation learning and interactive modeling of visual content are critical aspects for advancing visual interpretation in different computer vision tasks. However, visual data's inherent uncertainty and ambiguity remain a critical challenge facing representation learning algorithms. In response to this challenge, this study presents a simple but effective model, namely, the neutrosophic-based transformer network (NTN), which integrate the theory of neutrosophic logic and transformer architecture to offer unprecedented challenge in managing uncertainties. The design of NTN includes three primary building blocks: neutrosophic encoding, multipath network, and fusion and decision modules. Motivated by the success of neutrosophic in interpreting indeterminacy involved in visual data, we introduce a neutrosophic encoding module that applies a convolving window to map image data into the neutrosophic domain (truth, indeterminacy, and falsehood). This helps the NTN mitigate spatial and intensity uncertainties present in image patches, thereby enhancing boundary and uniformity retention while minimizing discontinuities. Then, multipath networks are built with visual transformer encoding blocks (composed of multi-head self-attention, feed-forward network, and residual link) to take the responsibility of learning rich representations from the generated neutrosophic image. By the end of NTN, the multiplicative fusion module is presented to fuse diverse knowledge from different network paths to obtain insightful representation that can assist in making informed decisions about the input. A set of proof-ofconcept experiments are conducted to evaluate the proposed NTN against cutting-edge approaches using two image recognition datasets (namely Fashion-MNIST and CIFAR-10) with different uncertainty settings, and the findings demonstrate the potential of NTN in maintaining high representation power through efficient modeling of uncertainty information within visual recognition tasks.