Published September 22, 2022 | Version v1
Dataset Open

WhichDog: A crowdsourced dataset including candidate set-based labelling

  • 1. Basque Center for Applied Mathematics (BCAM)
  • 2. Universitat de Barcelona (UB)

Description

A dataset with crowdsourced labels for aggregation and supervised classification.
It contains 400 images of dogs from the Stanford Dogs dataset (http://vision.stanford.edu/aditya86/ImageNetDogs/). Images of dogs that belong to 32 different breeds (classes) are included. Annotators were asked to provide two types of labelling: full labelling (each labeler is allowed to provide a single label for each image) and candidate labelling (each labeler is allowed to provide a set of candidate labels for each image). It includes a total of 61227 annotations (30628 full and 30599 candidate) obtained from a set of 1028 different labelers.

The labels were collected through the online crowdsourcing platform Amazon mTurk thanks to funds provided by the Basque Government through the Elkartek program (KK-2018/00071). The assignments were designed as sequences of 64 images that were given to the annotators. Each image in the sequence was provided together with a specific subset of possible labels (with the number of options ranging from 4 to 32), and a instruction for the annotator to perform a specific type of labelling (full or candidate). Each labeler performed at least one assignment. Not all the labelers completed the 64 annotations in their assignments.

The file 'whichdog.zip' contains a folder ('images') with the 400 images of dogs, a text file ('breed_names.txt') that indicates the names of the different breeds and their assigned label (a number in the interval from 0 to 31) and a CSV file ('whichdog_all_annots.csv') that contains the information about the annotations. Each row of the CSV file represents a single annotation, and each column shows:
- image_id: ID number of the image.
- is_candidate: indicates whether the requested labelling is full (0) or candidate (1).
- labeler_id: ID number of the labeler.
- time: time employed by the labeler to perform the annotation.
- answer: label or set of labels provided by the labeler as annotation.
- options: subset of possible labels shown to the labeler.
- assignment_id: ID number of the assignment
- sequence_point: number that indicates the point of the sequence of images of the assignment in which the annotation was provided.
- class: ground truth label of the image.

Files

whichdog.zip

Files (16.2 MB)

Name Size Download all
md5:ec129ef90a31eb158537983c5ab0c5df
16.2 MB Preview Download