Published January 11, 2026
| Version v1
Journal article
Open
NanoRay V2: Bridging the Gap Between Transformers and Edge AI via Cross-Architecture Distillation
Description
While Vision Transformers (ViTs) achieve state-of-the-art performance in medical image analysis, their massive computational cost makes them
unsuitable for edge deployment in resource-constrained environments. This study introduces NanoRay V2, a lightweight 2.5M-parameter
MobileNetV3 distilled from an 86M-parameter Vision Transformer (ViT-Base). By leveraging a soft-target distillation objective (α = 0.25, T = 4.0), we
transfer global attention behavior from the Transformer into the compact CNN. The distilled model achieves 84.19% accuracy, surpassing both its
teacher (83.96%) and a baseline CNN trained from scratch (83.29%) on the RSNA Pneumonia dataset. Grad-CAM analysis confirms that NanoRay
V2 inherits structure-aware global attention while maintaining inference speeds suitable for CPU-native mobile hardware. This work is intended
strictly for research purposes and is not a clinical diagnostic system.
Files
nanoray-v2-bridging-the-gap-between-transformers-and-edge-ai-via-cross-architecture-distillation-IJERTV15IS010017.pdf
Files
(267.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:f4ca7d9d817b8b4bf1b9c0bddb70a9fd
|
267.9 kB | Preview Download |
Additional details
Related works
- Is identical to
- Journal article: https://www.ijert.org/nanoray-v2-bridging-the-gap-between-transformers-and-edge-ai-via-cross-architecture-distillation (URL)