Published May 2, 2023 | Version 1.0.0-beta
Dataset Open

Data Models for Dataset Drift Controls in Machine Learning With Optical Images - Datasets

  • 1. Fraunhofer HHI and Dotphoton AG
  • 2. Dotphoton AG
  • 3. Fraunhofer HHI
  • 4. HEPIA/HES-SO
  • 5. Klinikum rechts der Isar
  • 6. Helmholtz Zentrum Munich
  • 7. University of Glasgow

Description

This dataset accompanies the paper titled

Data Models for Dataset Drift Controls in Machine Learning with Images

that appeared in the Transactions on Machine Learning Research

https://openreview.net/forum?id=I4IkGmgFJz
 

@article{
oala2023data,
title={Data Models for Dataset Drift Controls in Machine Learning With Optical Images},
author={Luis Oala and Marco Aversa and Gabriel Nobis and Kurt Willis and Yoan Neuenschwander and Mich{\`e}le Buck and Christian Matek and Jerome Extermann and Enrico Pomarico and Wojciech Samek and Roderick Murray-Smith and Christoph Clausen and Bruno Sanguinetti},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2023},
url={https://openreview.net/forum?id=I4IkGmgFJz},
note={}
}

We make available two datasets.

Raw-Microscopy:

  • 940 raw bright-field microscopy images of human blood smear slides for leukocyte classification (microscopy/images/raw_scale100) with corresponding labels (microscopy/labels).
  • 5,640 variations measured at six additional different intensities (microscopy/images/raw_scale001-raw_scale0075)
  • 11,280 images of the raw sensor data processed through twelve different pipelines (microscopy/images/processed_views)

Raw-Drone:

  • 548 raw drone camera images for car segmentation (drone/images_tiles_256/raw_scale100) with corresponding binary segmentation mask (drone/masks_tiles_256). The images and the masks are cropped from 12 raw drone camera images (drone/images_full/raw_scale100) and 12 masks (drone/masks_full) of size 3648 by 5472.
  • 3,288 variations measured at six additional different intensities (drone/images_tiles_256/raw_scale001-raw_scale075).
  • 6,576 images of the raw sensor data processed through twelve different pipelines (drone/images_tiles_256/processed_views).

Detailed datasheets for the two datasets can be found in the appendices of the TMLR paper.

The code repository for this project can be found at https://github.com/aiaudit-org/raw2logit

 

Files

drone.zip

Files (12.1 GB)

Name Size Download all
md5:5721e41c46f1d2f156ba3c60899ef6c3
5.6 GB Preview Download
md5:6e4eadfffd629dbf66e4b4619bd227bd
6.5 GB Preview Download

Additional details

Related works

Is published in
Journal article: 2835-8856 (ISSN)
Journal article: https://openreview.net/forum?id=I4IkGmgFJz (URL)