How does the choice of attention mechanism (e.g., sparse vs. dense) in vision transformers affect mean Interse

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20439318

Published May 29, 2026 | Version v1

Report Open

How does the choice of attention mechanism (e.g., sparse vs. dense) in vision transformers affect mean Interse

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Since the introduction of Vision Transformers, the landscape of many computer vision tasks (e.g., semantic segmentation), which has been overwhelmingly dominated by CNNs, recently has significantly revolutionized. However, the computational cost and memory requirement renders these methods unsuitable on the mobile device. In this paper, we introduce a new method squeeze-enhanced Axial Transformer (SeaFormer) for mobile visual recognition. Specifically, we design a generic attention block characterized by the formulation of squeeze Axial and detail enhancement. It can be further used to create

Research goal: How does the choice of attention mechanism (e.g., sparse vs. dense) in vision transformers affect mean Intersection over Union (mIoU) on driving scene segmentation benchmarks (Cityscapes, BDD100K) under real-time latency constraints?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.2/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.2/10.

Files

paper.pdf

Files (93.9 kB)

Name	Size	Download all
paper.pdf md5:b87a31c4c57095a68833a5c4fbbfc75b	93.9 kB	Preview Download

	All versions	This version
Views	3	3
Downloads	1	1
Data volume	93.9 kB	93.9 kB

How does the choice of attention mechanism (e.g., sparse vs. dense) in vision transformers affect mean Interse

Authors/Creators

Description

Notes

Files

paper.pdf

Files (93.9 kB)