Published January 11, 2026 | Version v1
Journal article Open

NanoRay V2: Bridging the Gap Between Transformers and Edge AI via Cross-Architecture Distillation

Authors/Creators

  • 1. Noble University

Description

While Vision Transformers (ViTs) achieve state-of-the-art performance in medical image analysis, their massive computational cost makes them unsuitable for edge deployment in resource-constrained environments. This study introduces NanoRay V2, a lightweight 2.5M-parameter MobileNetV3 distilled from an 86M-parameter Vision Transformer (ViT-Base). By leveraging a soft-target distillation objective (α = 0.25, T = 4.0), we transfer global attention behavior from the Transformer into the compact CNN. The distilled model achieves 84.19% accuracy, surpassing both its teacher (83.96%) and a baseline CNN trained from scratch (83.29%) on the RSNA Pneumonia dataset. Grad-CAM analysis confirms that NanoRay V2 inherits structure-aware global attention while maintaining inference speeds suitable for CPU-native mobile hardware. This work is intended strictly for research purposes and is not a clinical diagnostic system.

Files

nanoray-v2-bridging-the-gap-between-transformers-and-edge-ai-via-cross-architecture-distillation-IJERTV15IS010017.pdf

Additional details