Logos in the Wild dataset

Monteux, Angelo

doi:10.5281/zenodo.5101018

Published July 14, 2021 | Version v1

Dataset Open

Logos in the Wild dataset

Monteux, Angelo

Logos in the Wild dataset - unofficial version

I am uploading the following dataset even though I am not the original author (see https://arxiv.org/abs/1710.10891). I built an unofficial implementation of the algorithm using this dataset (named Logohunter) and have received emails of interested researchers looking for the dataset; unfortunately the original IOSB Fraunhofer link referred in the paper is dead. In addition, as the original dataset only provides URLs to the images (some of which have disappeared with time), I am uploading the images themselves as well. While copyrighted, this is clearly fair use.

The LogosInTheWild-v2.zip file contains the original dataset (URLs only), while litw_cleaned.tar.gz contains the full dataset with images (as downloaded in February 2019).

Below is the content of the README from the original dataset authors:

# General remarks
This dataset consists of web images which were crawled via Google
image search and according logo annotions. It was collected at
Fraunhofer IOSB in Karlsruhe, Germany.
For dataset related matters please contact Christian Herrmann:
christian.herrmann@iosb.fraunhofer.de.

# Structure
Each folder contains the raw Pascal VOC style xml annotation files and a
urls.txt file containing a list of URLs where the images can be
downloaded. Each row in the list contains the image ID and the URL
of the image file.

A folder includes all images resulting from the Google image search for
this brand. Because images can show a large variety of logos beyond the
keyword search, there are a lot logos of different brands within each
folder or sometimes even within a single image.
The bounding box name denotes the actual brand for each logo. When
necessary, separation is made between graphical and textual logos via
additional specifiers of the brand name (e.g. 'porsche-logo',
'porsche-text'). Visually different logos of one brand are separated by
enumeration if distinction by graphical/textual is impossible (e.g.
'adidas1', 'adidas2'). There are some misspellings and inconsistencies
with the labels and specifiers in the raw annotation files. We opt not to
alter the raw files provided by the annotation crew but instead fix the
issues by the create_clean_dataset.py script (see Scripts section below).

List of cleaned specifiers:
- 'text': pure textual logo
- 'symbol': graphical logo

Additional specifiers in raw annotations:
- 'partial','teilsichtbar': logo is significantly occluded and thus only
partially visible, this information is only included in the raw annotations
- 'schrift','schriftzug': same as 'text'
- 'logo': same as 'symbol'

# Scripts
To ease processing, the scripts folder contains a Python scripts to
preprocess the dataset.
create_clean_dataset.py corrects labeling mistakes and can create
different versions of the dataset:

1.) Clean Pascal VOC dataset structure which is straight-forward readable
by a lot of object detector frameworks. This is created in all cases:
python create_clean_dataset.py --in ./data --out ./cleaned-data

2.) Cropped logos sorted into seperate brand folders. This addresses
classification or verification tasks. Parameter: --roi.

3.) Logo classes from FlickrLogos32 can be excluded from 1) and 2) via
--wofl32. This allows training on Logos in the Wild and testing on
FlickrLogos32 if brand overlap is undesired, such as for open-set
evalutation.

# How to get started
1.) Download the images from the provided URLs.
2.) Execute create_clean_dataset.py script.

# Dataset usage
If you use this dataset in your work please cite:

```
@INPROCEEDINGS{,
author = {T{\"u}zk{\"o}, Andras and Herrmann, Christian and Manger, Daniel
and J{\"u}rgen Beyerer},
title = {{O}pen {S}et {L}ogo {D}etection and {R}etrieval},
booktitle = {Proceedings of the 13th International Joint Conference on
Computer Vision, Imaging and Computer Graphics Theory and Applications:
VISAPP},
year = {2018}}
```

Files

LogosInTheWild-v2.zip

Files (2.1 GB)

Name	Size	Download all
litw_cleaned.tar.gz md5:107cbdf044d7bce03bd6458e53bfbe06	2.1 GB	Download
LogosInTheWild-v2.zip md5:f4c5041591dc8bd2dab969ed31b1a12c	7.0 MB	Preview Download

	All versions	This version
Views	4,873	4,833
Downloads	2,067	2,053
Data volume	4.6 TB	4.6 TB

Logos in the Wild dataset

Creators

Description

Files

LogosInTheWild-v2.zip

Files (2.1 GB)