Efficient Training of Visual Transformers with Small Datasets

Yahui Liu; Enver Sangineto; Wei Bi; Niculae Sebe; Bruno Lepri; Marco de Nadai

doi:10.5281/zenodo.6363240

Published March 16, 2022 | Version v1

Conference paper Open

Efficient Training of Visual Transformers with Small Datasets

1. University of Trento, Italy
2. Tencent AI Lab
3. FBK

Visual Transformers (VTs) are emerging as an architectural paradigm alternative to
Convolutional networks (CNNs). Differently from CNNs, VTs can capture global
relations between image elements and they potentially have a larger representation
capacity. However, the lack of the typical convolutional inductive bias makes these
models more data hungry than common CNNs. In fact, some local properties of the
visual domain which are embedded in the CNN architectural design, in VTs should
be learned from samples. In this paper, we empirically analyse different VTs,
comparing their robustness in a small training set regime, and we show that, despite
having a comparable accuracy when trained on ImageNet, their performance on
smaller datasets can be largely different. Moreover, we propose an auxiliary selfsupervised
task which can extract additional information from images with only a
negligible computational overhead. This task encourages the VTs to learn spatial
relations within an image and makes the VT training much more robust when
training data is scarce. Our task is used jointly with the standard (supervised)
training and it does not depend on specific architectural choices, thus it can be
easily plugged in the existing VTs. Using an extensive evaluation with different
VTs and datasets, we show that our method can improve (sometimes dramatically)
the final accuracy of the VTs. Our code is available at: https://github.com/
yhlleo/VTs-Drloc.

Files

NeurIPS-2021-efficient-training-of-visual-transformers-with-small-datasets-Paper (3).pdf

Files (4.9 MB)

Name	Size	Download all
NeurIPS-2021-efficient-training-of-visual-transformers-with-small-datasets-Paper (3).pdf md5:411e5a67490ed2aa41b4a00226d6c641	4.9 MB	Preview Download

Additional details

European Commission
AI4Media - A European Excellence Centre for Media, Society and Democracy 951911

	All versions	This version
Views	188	188
Downloads	165	165
Data volume	834.6 MB	834.6 MB

Efficient Training of Visual Transformers with Small Datasets

Authors/Creators

Description

Files

NeurIPS-2021-efficient-training-of-visual-transformers-with-small-datasets-Paper (3).pdf

Files (4.9 MB)

Additional details

Funding