Dataset B of Paper "Hidden in Plain Signs: Realistic Sticker Attacks on Production Traffic Sign Recognition Systems"
Authors/Creators
Description
⚠️ Academic Use Only — Non-Commercial Research Dataset This dataset is a derivative work assembled exclusively for academic research purposes. It inherits Non-Commercial (NC) restrictions from its source datasets. See the License section for full details.
Overview
This dataset was compiled for academic research on traffic sign detection and recognition. It aggregates and preprocesses images from five publicly available benchmark datasets, applying cropping and resizing transformations to meet the input requirements of the deep learning models evaluated in our work.
Key facts:
- Purpose: Academic research only (non-commercial)
- Task: Traffic sign detection / recognition
- Images: 30,226
- Classes: 17
- Splits: Train / Validation / Test
- Image format: JPEG
Dataset Composition
This dataset is a derivative work combining images from the following source datasets. Each subset retains the license of its original source.
| # | Source Dataset | Images Used | Classes Used |
|---|---|---|---|
| 1 | Mapillary Traffic Sign Dataset (MTSD) | 13,662 | Limits 10-120 km/h, Stop, No Vehicles, No Stopping, No Parking, No Entry |
| 2 | DFG Traffic Sign Dataset | 2,360 | Limits 10, 30-70 km/h, Stop, No Vehicles, No Stopping, No Parking, No Entry |
| 3 | TT100k | 6,605 | Limits 10-120 km/h, No Vehicles, No Stopping, No Parking, No Entry |
| 4 | GTSDB (German Traffic Sign Detection Benchmark) | 357 | Limits 20, 30, 50-80, 100, 120 km/h, Stop, No Entry |
| 5 | ItalianSigns | 361 | all (Limits 20-90 km/h) |
Due to the preprocessing phase (zooming/cropping), some images were split into multiple patches. The total number of images in this dataset is therefore greater than the sum of the "Images Used" column above.
Dataset Structure
dataset/
├── LICENSE.txt
├── scripts/
│ └── adapt_sources # Pipelines to extract data from original sources, one script per dataset
│ └── Mapillary.py
│ └── DFG.py
│ └── TT100k.py
│ └── GTSDB.py
│ └── ItalianSigns.py
│ └── zoom_and_crop.py # Script to be applied after download of the sources, to zoom images with very small objects
├── images/
│ ├── train/
│ └── test/
│ └── val/
├── labels/ # One .txt file per image, with the annotations in YOLO format.
│ ├── train/
│ └── test/
│ └── val/
Label format: YOLO .txt
Preprocessing
Once downloaded the official sources, images were processed using the scripts available in the /scripts folder of this repository, to extract the relevant images for our purposes and meet model input requirements.
In particular:
- Images extraction: The scripts in
/scripts/adapt_sourceshave been run, one for each dataset. They select only the relevant images, and convert their annotations using our labeling and the YOLO format. The images are renamed using the name of the source dataset with an incremental numeric suffix (e.g., `DFG (0).jpg`, `DFG (1).jpg`). - Cropping and zooming: The script
/scripts/zoom_and_crop.pyhas been run on the images obtained after step 1. It zooms in on the images with very small traffic signs, to better isolate the sign region. When an image is split into multiple patches during such process, an incremental numeric suffix is appended to the obtained sub-images (e.g., `DFG (0)_0.jpg`, `DFG (0)_1.jpg`).
Finally, the dataset is split into training (70%), testing (15%) and validation (15%) images.
License
This dataset is a derivative work and is released under CC BY-NC-SA 4.0 (Creative Commons Attribution – NonCommercial – ShareAlike 4.0 International), which is the most restrictive license among those of the contributing source datasets that require ShareAlike terms.
In summary, you are free to:
- Share — copy and redistribute this dataset in any medium or format
- Adapt — preprocess, crop, or otherwise transform the material
Under the following terms:
- Attribution (BY) — You must give appropriate credit to all original source datasets (see Citations).
- NonCommercial (NC) — You may not use this dataset for commercial purposes.
- ShareAlike (SA) — If you build upon this dataset, you must distribute your derivative work under the same CC BY-NC-SA 4.0 license.
Per-source license summary
| Source | License | License Link |
|---|---|---|
| Mapillary TSD | CC BY-NC-SA 4.0 | https://creativecommons.org/licenses/by-nc-sa/4.0/ |
| DFG Traffic Sign Dataset | CC BY-NC-SA 4.0 | https://creativecommons.org/licenses/by-nc-sa/4.0/ |
| TT100k | CC BY-NC 4.0 | https://creativecommons.org/licenses/by-nc/4.0/ |
| GTSDB | No explicit license stated — research use only per original authors | https://benchmark.ini.rub.de/gtsdb_dataset.html |
| ItalianSigns | GNU LGPLv3 | https://www.gnu.org/licenses/lgpl-3.0.html |
Disclaimer: The authors of this compiled dataset are not lawyers and this summary does not constitute legal advice. Users are responsible for verifying compliance with each source dataset's license before use.
Note on GTSDB licensing: The GTSDB was published by the Institut für Algorithmen und Kognitive Systeme (KIT). Please refer to the original dataset page for the authoritative license terms before reusing this subset.
Citations
If you use this dataset in your research, please cite all original source datasets listed below.
Source dataset citations
1. Mapillary Traffic Sign Dataset (MTSD)
@inproceedings{ertler2020mapillary,
title={The mapillary traffic sign dataset for detection and classification on a global scale},
author={Ertler, Christian and Mislej, Jerneja and Ollmann, Tobias and Porzi, Lorenzo and Neuhold, Gerhard and Kuang, Yubin},
booktitle={European conference on computer vision},
pages={68--84},
year={2020},
organization={Springer}
}
2. DFG Traffic Sign Dataset
@article{Tabernik2019ITS,
author = {Tabernik, Domen and Sko{\v{c}}aj, Danijel},
journal = {IEEE Transactions on Intelligent Transportation Systems},
title = {{Deep Learning for Large-Scale Traffic-Sign Detection and Recognition}},
year = {2019},
doi={10.1109/TITS.2019.2913588},
ISSN={1524-9050}
}
3. TT100k
@InProceedings{Zhe_2016_CVPR,
title = {Traffic-Sign Detection and Classification in the Wild},
author = {Zhu, Zhe and Liang, Dun and Zhang, Songhai and Huang, Xiaolei
and Li, Baoli and Hu, Shimin},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2016}
}
4. GTSDB (German Traffic Sign Detection Benchmark)
@inproceedings{houben2013gtsdb,
title = {Detection of Traffic Signs in Real-World Images: The {German Traffic
Sign Detection Benchmark}},
author = {Houben, Sebastian and Stallkamp, Johannes and Salmen, Jan and
Schlipsing, Marc and Igel, Christian},
booktitle = {International Joint Conference on Neural Networks (IJCNN)},
year = {2013},
doi = {10.1109/IJCNN.2013.6706807}
}
5. ItalianSigns
@misc{ItalianSigns,
author={Daniel Rossi and Riccardo Salami},
title={ItalianSigns},
year={2022},
howpublished={\url{https://www.kaggle.com/datasets/officialprojecto/italiansigns}}
}
Acknowledgements
We thank the authors and institutions that made their datasets publicly available for research purposes:
- The Mapillary team for the MTSD dataset.
- Domen Tabernik and Danijel Skočaj for the DFG Traffic Sign Dataset.
- Zhe Zhu et al. for the TT100k dataset.
- Sebastian Houben et al. for the GTSDB.
- Daniel Rossi and Riccardo Salami for the ItalianSigns traffic sign dataset.
Contact
This dataset was submitted anonymously for peer review. After the review process, author information and the associated paper reference will be added here.
For questions regarding licensing and reuse, please open an issue on this repository or contact the corresponding author after de-anonymization.
Last updated: 2026 Dataset version: 1.0
Files
images.zip
Additional details
References
- Christian Ertler, Jerneja Mislej, Tobias Ollmann, Lorenzo Porzi, Ger- hard Neuhold, and Yubin Kuang. The mapillary traffic sign dataset for detection and classification on a global scale. In European conference on computer vision, pages 68–84. Springer, 2020.
- Domen Tabernik and Danijel Skoˇcaj. Deep Learning for Large- Scale Traffic-Sign Detection and Recognition. IEEE Transactions on Intelligent Transportation Systems, 2019.
- Zhe Zhu, Dun Liang, Songhai Zhang, Xiaolei Huang, Baoli Li, and Shimin Hu. Traffic-sign detection and classification in the wild. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Sebastian Houben, Johannes Stallkamp, Jan Salmen, Marc Schlipsing, and Christian Igel. Detection of Traffic Signs in Real-World Images: The German Traffic Sign Detection Benchmark.International Joint Conference on Neural Networks (IJCNN 2013), pp. 715-722, IEEE Press
- Daniel Rossi and Riccardo Salami. Italiansigns. https://www.kaggle. com/datasets/officialprojecto/italiansigns, 2022.