Scaled and Translated Image Recognition (STIR)
Creators
- 1. Friedrich-Alexander-Universität Erlangen-Nürnberg
- 2. Fraunhofer-Institut für Integrierte Schaltungen
Description
Paper: [2211.10288] Just a Matter of Scale? Reevaluating Scale Equivariance in Convolutional Neural Networks (arxiv.org)
Code: taltstidl/scale-equivariant-cnn: Official code for "Just a Matter of Scale? Reevaluating Scale Equivariance in Convolutional Neural Networks" (github.com)
While convolutions are known to be invariant to (discrete) translations, scaling continues to be a challenge and most image recognition networks are not invariant to them. To explore these effects, we have created the Scaled and Translated Image Recognition (STIR) dataset. This dataset contains objects of size \(s \in [17,64]\), each randomly placed in a \(64 \times 64\) pixel image.
Using the dataset
Depending on which data you are planning to use, download one or more of the following files. Data is stored in compressed .npz
format and can be loaded as documented here.
File | Description |
---|---|
emoji.npz |
Emoji vector icons rendered as white icon on black background |
mnist.npz |
Classic MNIST handwritten digits rescaled to varying sizes |
trafficsign.npz |
Traffic signs from street imagery downscaled to varying sizes |
aerial.npz |
Objects in aerial imagery downscaled to varying sizes |
Each file contains multiple arrays that can be accessed in a dictionary-like fashion. The keys are documented below, where n
is the number of classes for a given file and m
is the number of instances for each class. Both emoji.npz
(36 classes, 1 instance) and mnist.npz
(10 classes, 50 instances) are in black & white while trafficsign.npz
(16 classes, 25 instances) and aerial.npz
(9 classes, 25 instances) are in color.
Key | Shape | Description |
---|---|---|
imgs |
(3, 48, n, m, 64, 64) black & white, (3, 48, n, 64, 64, 3) color |
Images grouped into 3 sets (training, validation, testing) and 48 different scales. Values will be in range 0 to 255 . |
lbls |
(3, 48, n, m) |
Indices referencing ground truth labels. See lbldata for descriptive names. Values will be in range 0 to n - 1 . |
scls |
(3, 48, n, m) |
Known scales as given by bounding box size. Values will be in range 17 to 64 . |
psts |
(3, 48, n, m, 2) |
Known position of bounding box. First value is distance to left edge, second value distance to top edge. |
metadata |
(6, 2) |
Metadata on title, description, author, license, version and date. |
lbldata |
(n,) |
Descriptive names for each ground truth labels. |
For use in Python a dataset class is provided that implements the basic functionality for loading a certain split and scale selection, as illustrated in the code below. It ensures shuffling is done in a consistent manner such that ground truth scales and positions can be retrieved. Metadata and label descriptions can be retrieved via metadata
and labeldata
, respectively.
from data.dataset import STIRDataset
dataset = STIRDataset('data/emoji.npz')
# Obtain images and labels for training
images, labels = dataset.to_torch(split='train', scales=[32, 64], shuffle=True)
# Obtain known scales and positions for above
scales, positions = dataset.get_latents(split='train', scales=[32, 64], shuffle=True)
# Get metadata and label descriptions
metadata = dataset.metadata
label_descriptions = dataset.labeldata
License and Attribution
When using this dataset for your own research, please respect the individual licenses of the original data. These are distributed within the data files' metadata. For attribution in papers, we recommend the following citations.
- D. Gandy, J. Otero, E. Emanuel, F. Botsford, J. Lundien, K. Jackson, M. Wilkerson, R. Madole, J. Raphael, T. Chase, G. Taglialatela, B. Talbot, and T. Chase. Font Awesome. https://fontawesome.com/v5/download, Nov. 2022.
- Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278–2324, Nov. 1998.
- C. Ertler, J. Mislej, T. Ollmann, L. Porzi, G. Neuhold, and Y. Kuang. The Mapillary Traffic Sign Dataset for Detection and Classification on a Global Scale. In 2020 16th Eur. Conf. Comput. Vision (ECCV), Glasgow, UK, Aug. 2020.
- G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In 2018 IEEE/CVF Conf. Comput. Vision and Pattern Recognition (CVPR), pages 3974–3983, Salt Lake City, UT, USA, June 2018.
Files
Files
(934.7 MB)
Name | Size | Download all |
---|---|---|
md5:b80315d8c3a9dfe44d140fbaaf9fb901
|
314.4 MB | Download |
md5:ba9c26a5d506a83c8d339fe1e5bb99c7
|
1.4 MB | Download |
md5:733eb17e09acce7ac9cdfca4d2df36df
|
44.7 MB | Download |
md5:9747d941385e41f2edf19d134e2f8136
|
574.2 MB | Download |