How does the choice of attention mechanism (e.g., sparse vs. dense) in vision transformers affect mean Interse
Description
Since the introduction of Vision Transformers, the landscape of many computer vision tasks (e.g., semantic segmentation), which has been overwhelmingly dominated by CNNs, recently has significantly revolutionized. However, the computational cost and memory requirement renders these methods unsuitable on the mobile device. In this paper, we introduce a new method squeeze-enhanced Axial Transformer (SeaFormer) for mobile visual recognition. Specifically, we design a generic attention block characterized by the formulation of squeeze Axial and detail enhancement. It can be further used to create
Research goal: How does the choice of attention mechanism (e.g., sparse vs. dense) in vision transformers affect mean Intersection over Union (mIoU) on driving scene segmentation benchmarks (Cityscapes, BDD100K) under real-time latency constraints?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.2/10.
Notes
Files
paper.pdf
Files
(93.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:b87a31c4c57095a68833a5c4fbbfc75b
|
93.9 kB | Preview Download |