There is a newer version of the record available.

Published August 26, 2022 | Version v1
Dataset Open

PAP-QMNIST dataset

  • 1. Uppsala University

Description

PAP-QMNIST - a synthetic dataset that mimics several properties of existing oral cancer (OC) dataset (described in the study [1]) such as cell image size, color distribution, arbitrary rotation of cells, amount of blur and noise, number of patients, and number of images per each patient.
A main advantage of PAP-QMNIST is that it offers access to reliable ground truth annotation at the instance (cell) level in
combination with being visually interpretable for non-experts. We base PAP-QMNIST on the QMNIST dataset [2], for which the object (digit) is located in the central part of the image, similarly to (the detected and cut-out) nuclei in our OC data. We rescale original QMNIST images to the size of OC images using bilinear interpolation, we add color and augment this dataset by including images transformed by transformations expected in OC data to replicate the number of patients and number of images per patient in the OC dataset. The details are in [1] and the code to create such PAP-QMNIST data is Create_PAP_QMNISTbags_datasets.ipynb. The uploaded PAP-QMNIST datasets (PAP5perc_key_inst.zip, PAP10perc_key_inst.zip, PAP20perc_key_inst.zip, PAP30perc_key_inst.zip are correspondingly versions of PAP-QMNIST with 5, 10, 20 and 30% of key instances) are generated and analyzed during the study [1]. Names for images of key instances (images of digit '4') are starting with '4'.

[1] Koriakina, N., Sladoje, N., Bašić, V., & Lindblad, J. (2022). Oral cancer detection and interpretation: Deep multiple instance learning versus conventional deep single instance learning. arXiv preprint arXiv:2202.01783.

[2] Yadav, C., & Bottou, L. (2019). Cold case: The lost mnist digits. Advances in neural information processing systems32.

Files

Create_PAP_QMNISTbags_datasets.ipynb

Files (2.4 GB)

Name Size Download all
md5:8de5d07d97dac7b9e325c81c4c6d7d0b
20.8 kB Preview Download
md5:455c64049a7f37c4c4b4c3ec8be376e5
593.6 MB Preview Download
md5:34a93e747b2b9d8422e729f4722f6820
593.5 MB Preview Download
md5:0dfbafe67d48586056ea391384143c50
593.4 MB Preview Download
md5:37d7da19248a2c5c44cd686b9925e8d7
593.2 MB Preview Download
md5:d8955e448094e5dad11bce649539cde6
2.8 kB Preview Download