Published June 5, 2024
| Version v1
Dataset
Open
dopanim: A Dataset of Doppelganger Animals with Noisy Annotations from Multiple Humans
Creators
Description
Profile
- The
dopanimdataset features about 15,750 animal images of 15 classes, organized into four groups ofdoppelgangeranimals and collected together with ground truth labels from iNaturalist. For approximately 10,500 of these images, 20 humans provided over 52,000 annotations with an accuracy of circa 67%. - Key attributes include the challenging task of classifying doppelganger animals, human-estimated likelihoods per image-annotator pair, and annotator metadata.
- The dataset's broad research scope covers noisy label learning, multi-annotator learning, active learning, and learning beyond hard labels.
- Further information is given in the associated article and our GitHub repository for using the data.
File Descriptions
task_data.jsoncontains data, e.g., the ground truth class labels, for each image classification task. Thereby, each task record is indexed by the iNaturalist observation index. A description of each record's entries is given in the supplementary material of the associated article.annotation_data.jsoncontains data, e.g., likelihoods per animal class, for each obtained image annotation. Thereby, each annotation record has a unique identifier. A description of each record's entries is given in the supplementary material of the associated article.annotator_metadata.jsoncontains metadata, e.g., self-assessed levels of knowledge and interest regarding animals, for each annotator. Thereby, each metadata record is indexed by the anonymous identifier of an annotator. A description of each record's entries is given in the supplementary material of the associated article.train.zip,valid.zip, andtest.zipcontain the training, validation, and test images organized into directories of the 15 animal classes.
Licenses
- Images and their associated metadata are collected as observations from iNaturalist. Thereby, we constrained the collection to images and metadata with CC0, CC-BY, or CC-BY-NC licenses. The information about these licenses is given by the fields
license_codeandphoto_license_codein each record oftask_data.json. The links to each image and observation are given for further reference. - We collected the data in the files
annotation_data.jsonandannotator_metadata.jsonin an annotation campaign via LabelStudio and distribute them under the license CC-BY-NC 4.0.
Contact
- If you have questions or issues relevant to other dataset users, we ask you to create a corresponding issue at our GitHub repository.
- In all other cases, you can contact the dataset collectors via the e-mail marek.herde@uni-kassel.de.
Acknowledgements
This work was funded by the ALDeep and CIL projects at the University of Kassel. Moreover, we thank Franz Götz-Hahn for his insightful comments on improving our annotation campaign. Finally, we thank the iNaturalist community for their many observations that help explore our nature's biodiversity and our annotators for their dedicated efforts in making the annotation campaign via LabelStudio possible.
Disclaimer
- We carefully selected and composed this dataset's content. If you believe that any of this content violates licensing agreements or infringes on intellectual property rights, please contact us immediately (cf. contact information). In such a case, we will promptly investigate the issue and remove the implicated data records from our dataset if necessary.
- Users are responsible for ensuring that their use of the dataset complies with all licenses, applicable laws, regulations, and ethical guidelines. We make no representations or warranties of any kind and accept no responsibility in the case of violations.
Files
annotator_metadata.json
Files
(2.2 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:c0218a3a1a33b6ec29fdd78158ba6270
|
40.9 MB | Preview Download |
|
md5:b9ce61379ac554798ab0f437cf2312b4
|
37.9 kB | Preview Download |
|
md5:85793328665eb6dfc7cdb14e0615d219
|
17.8 MB | Preview Download |
|
md5:046c7e2f60c319deed1328f6d4cac23a
|
615.2 MB | Preview Download |
|
md5:95e628ed85927c1ac82d7680072d317b
|
1.4 GB | Preview Download |
|
md5:a7d42f56d187d07551ea7633615d010f
|
103.0 MB | Preview Download |
Additional details
Funding
- University of Kassel
Software
- Repository URL
- https://github.com/ies-research/multi-annotator-machine-learning
- Programming language
- Python
- Development Status
- Active