PAP-QMNIST dataset
Description
PAP-QMNIST - a synthetic dataset that mimics several properties of existing oral cancer (OC) dataset (described in the study [1]) such as cell image size, color distribution, arbitrary rotation of cells, amount of blur and noise, number of patients, and number of images per each patient.
A main advantage of PAP-QMNIST is that it offers access to reliable ground truth annotation at the instance (cell) level in
combination with being visually interpretable for non-experts. We base PAP-QMNIST on the QMNIST dataset [2], for which the object (digit) is located in the central part of the image, similarly to (the detected and cut-out) nuclei in our OC data. We rescale original QMNIST images to the size of OC images using bilinear interpolation, we add color and augment this dataset by including images transformed by transformations expected in OC data to replicate the number of patients and number of images per patient in the OC dataset. The details are in [1] and the code to create such PAP-QMNIST data is Create_PAP_QMNISTbags_datasets.ipynb. The uploaded PAP-QMNIST datasets (PAP5perc_key_inst.zip, PAP10perc_key_inst.zip, PAP20perc_key_inst.zip, PAP30perc_key_inst.zip are correspondingly versions of PAP-QMNIST with 5, 10, 20 and 30% of key instances) are generated and analyzed during the study [1]. Names for images of key instances (images of digit '4') are starting with '4'.
[1] Koriakina, N., Sladoje, N., Bašić, V., & Lindblad, J. (2022). Oral cancer detection and interpretation: Deep multiple instance learning versus conventional deep single instance learning. arXiv preprint arXiv:2202.01783.
[2] Yadav, C., & Bottou, L. (2019). Cold case: The lost mnist digits. Advances in neural information processing systems, 32.
Files
Create_PAP_QMNISTbags_datasets.ipynb
Files
(2.4 GB)
Name | Size | Download all |
---|---|---|
md5:8de5d07d97dac7b9e325c81c4c6d7d0b
|
20.8 kB | Preview Download |
md5:455c64049a7f37c4c4b4c3ec8be376e5
|
593.6 MB | Preview Download |
md5:34a93e747b2b9d8422e729f4722f6820
|
593.5 MB | Preview Download |
md5:0dfbafe67d48586056ea391384143c50
|
593.4 MB | Preview Download |
md5:37d7da19248a2c5c44cd686b9925e8d7
|
593.2 MB | Preview Download |
md5:d8955e448094e5dad11bce649539cde6
|
2.8 kB | Preview Download |