There is a newer version of the record available.

Published June 5, 2024 | Version v1
Dataset Open

dopanim: A Dataset of Doppelganger Animals with Noisy Annotations from Multiple Humans

Description

Profile

  • The dopanim dataset features about 15,750 animal images of 15 classes, organized into four groups of doppelganger animals and collected together with ground truth labels from iNaturalist. For approximately 10,500 of these images, 20 humans provided over 52,000 annotations with an accuracy of circa 67%.
  • Key attributes include the challenging task of classifying doppelganger animals, human-estimated likelihoods per image-annotator pair, and annotator metadata.
  • The dataset's broad research scope covers noisy label learning, multi-annotator learning, active learning, and learning beyond hard labels.
  • Further information is given in the associated article and our GitHub repository for using the data.

File Descriptions

  • task_data.json contains data, e.g., the ground truth class labels, for each image classification task. Thereby, each task record is indexed by the iNaturalist observation index. A description of each record's entries is given in the supplementary material of the associated article.
  • annotation_data.json contains data, e.g., likelihoods per animal class, for each obtained image annotation. Thereby, each annotation record has a unique identifier. A description of each record's entries is given in the supplementary material of the associated article.
  • annotator_metadata.json contains metadata, e.g., self-assessed levels of knowledge and interest regarding animals, for each annotator. Thereby, each metadata record is indexed by the anonymous identifier of an annotator. A description of each record's entries is given in the supplementary material of the associated article.
  • train.zip, valid.zip, and test.zip contain the training, validation, and test images organized into directories of the 15 animal classes.

Licenses

  • Images and their associated metadata are collected as observations from iNaturalist. Thereby, we constrained the collection to images and metadata with CC0, CC-BY, or CC-BY-NC licenses. The information about these licenses is given by the fields license_code and photo_license_code in each record of task_data.json. The links to each image and observation are given for further reference.
  • We collected the data in the files annotation_data.json and annotator_metadata.json in an annotation campaign via LabelStudio and distribute them under the license CC-BY-NC 4.0.

Contact

  • If you have questions or issues relevant to other dataset users, we ask you to create a corresponding issue at our GitHub repository.
  • In all other cases, you can contact the dataset collectors via the e-mail marek.herde@uni-kassel.de.

Acknowledgements

This work was funded by the ALDeep and CIL projects at the University of Kassel. Moreover, we thank Franz Götz-Hahn for his insightful comments on improving our annotation campaign. Finally, we thank the iNaturalist community for their many observations that help explore our nature's biodiversity and our annotators for their dedicated efforts in making the annotation campaign via LabelStudio possible.

Disclaimer

  • We carefully selected and composed this dataset's content. If you believe that any of this content violates licensing agreements or infringes on intellectual property rights, please contact us immediately (cf. contact information). In such a case, we will promptly investigate the issue and remove the implicated data records from our dataset if necessary. 
  • Users are responsible for ensuring that their use of the dataset complies with all licenses, applicable laws, regulations, and ethical guidelines. We make no representations or warranties of any kind and accept no responsibility in the case of violations.

 

Files

annotator_metadata.json

Files (2.2 GB)

Name Size Download all
md5:c0218a3a1a33b6ec29fdd78158ba6270
40.9 MB Preview Download
md5:b9ce61379ac554798ab0f437cf2312b4
37.9 kB Preview Download
md5:85793328665eb6dfc7cdb14e0615d219
17.8 MB Preview Download
md5:046c7e2f60c319deed1328f6d4cac23a
615.2 MB Preview Download
md5:95e628ed85927c1ac82d7680072d317b
1.4 GB Preview Download
md5:a7d42f56d187d07551ea7633615d010f
103.0 MB Preview Download

Additional details

Funding

University of Kassel

Software

Repository URL
https://github.com/ies-research/multi-annotator-machine-learning
Programming language
Python
Development Status
Active