Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction

Guanglei Yang; Hao Tang; Mingli Ding; Niculae Sebe; Elisa Ricci

doi:10.1109/ICCV48922.2021.01596

Published October 27, 2021 | Version v1

Conference paper Open

Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction

1. Harbin Institute of Technology, China
2. ETH Zurich, Switzerland
3. 1Harbin Institute of Technology, China
4. University of Trento, Italy

While convolutional neural networks have shown a
tremendous impact on various computer vision tasks, they
generally demonstrate limitations in explicitly modeling
long-range dependencies due to the intrinsic locality of
the convolution operation. Initially designed for natural
language processing tasks, Transformers have emerged as
alternative architectures with innate global self-attention
mechanisms to capture long-range dependencies. In this
paper, we propose TransDepth, an architecture that benefits
from both convolutional neural networks and transformers.
To avoid the network losing its ability to capture locallevel
details due to the adoption of transformers, we propose
a novel decoder that employs attention mechanisms
based on gates. Notably, this is the first paper that applies
transformers to pixel-wise prediction problems involving
continuous labels (i.e., monocular depth prediction and
surface normal estimation). Extensive experiments demonstrate
that the proposed TransDepth achieves state-of-theart
performance on three challenging datasets. Our code is
available at: https://github.com/ygjwd12345/
TransDepth.

Files

Yang_Transformer-Based_Attention_Networks_for_Continuous_Pixel-Wise_Prediction_ICCV_2021_paper).pdf

Files (637.4 kB)

Name	Size	Download all
Yang_Transformer-Based_Attention_Networks_for_Continuous_Pixel-Wise_Prediction_ICCV_2021_paper).pdf md5:9ac2353fb4ead5dea2758bb615a0e22e	637.4 kB	Preview Download

Additional details

AI4Media – A European Excellence Centre for Media, Society and Democracy 951911: European Commission

	All versions	This version
Views	84	84
Downloads	74	74
Data volume	49.1 MB	49.1 MB

Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction

Creators

Description

Files

Yang_Transformer-Based_Attention_Networks_for_Continuous_Pixel-Wise_Prediction_ICCV_2021_paper).pdf

Files (637.4 kB)

Additional details

Funding