MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis

Jiancheng Yang; Rui Shi; Bingbing Ni

doi:10.5281/zenodo.4269852

Published November 12, 2020 | Version v1.0

Dataset Open

MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis

1. Shanghai Jiao Tong Univerisity

This data repository for MedMNIST v1 is out of date! Please check the latest version of MedMNIST v2.

Abstract

We present MedMNIST, a collection of 10 pre-processed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28x28 images, which requires no background knowledge. Covering the primary data modalities in medical image analysis, it is diverse on data scale (from 100 to 100,000) and tasks (binary/multi-class, ordinal regression and multi-label). MedMNIST could be used for educational purpose, rapid prototyping, multi-modal machine learning or AutoML in medical image analysis. Moreover, MedMNIST Classification Decathlon is designed to benchmark AutoML algorithms on all 10 datasets; We have compared several baseline methods, including open-source or commercial AutoML tools. The datasets, evaluation code and baseline methods for MedMNIST are publicly available at https://medmnist.github.io/.

Please note that this dataset is NOT intended for clinical use.

We recommend our official code to download, parse and use the MedMNIST dataset:

pip install medmnist

Citation and Licenses

If you find this project useful, please cite our ISBI'21 paper as:
     Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis," arXiv preprint arXiv:2010.14925, 2020.

or using bibtex:
     @article{medmnist,
         title={MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis},
         author={Yang, Jiancheng and Shi, Rui and Ni, Bingbing},
         journal={arXiv preprint arXiv:2010.14925},
         year={2020}
     }

Besides, please cite the corresponding paper if you use any subset of MedMNIST. Each subset uses the same license as that of the source dataset.

PathMNIST

Jakob Nikolas Kather, Johannes Krisam, et al., "Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study," PLOS Medicine, vol. 16, no. 1, pp. 1–22, 01 2019.

License: CC BY 4.0

ChestMNIST

Xiaosong Wang, Yifan Peng, et al., "Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases," in CVPR, 2017, pp. 3462–3471.

License: CC0 1.0

DermaMNIST

Philipp Tschandl, Cliff Rosendahl, and Harald Kittler, "The ham10000 dataset, a large collection of multisource dermatoscopic images of common pigmented skin lesions," Scientific data, vol. 5, pp. 180161, 2018.

Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, and Allan Halpern: “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”, 2018; arXiv:1902.03368.

License: CC BY-NC 4.0

OCTMNIST/PneumoniaMNIST

Daniel S. Kermany, Michael Goldbaum, et al., "Identifying medical diagnoses and treatable diseases by image-based deep learning," Cell, vol. 172, no. 5, pp. 1122 – 1131.e9, 2018.

License: CC BY 4.0

RetinaMNIST

DeepDR Diabetic Retinopathy Image Dataset (DeepDRiD), "The 2nd diabetic retinopathy – grading and image quality estimation challenge," https://isbi.deepdr.org/data.html, 2020.

License: CC BY 4.0

BreastMNIST

Walid Al-Dhabyani, Mohammed Gomaa, Hussien Khaled, and Aly Fahmy, "Dataset of breast ultrasound images," Data in Brief, vol. 28, pp. 104863, 2020.

License: CC BY 4.0

OrganMNIST_{Axial,Coronal,Sagittal}

Patrick Bilic, Patrick Ferdinand Christ, et al., "The liver tumor segmentation benchmark (lits)," arXiv preprint arXiv:1901.04056, 2019.

Xuanang Xu, Fugen Zhou, et al., "Efficient multiple organ localization in ct image using 3d region proposal network," IEEE Transactions on Medical Imaging, vol. 38, no. 8, pp. 1885–1898, 2019.

License: CC BY 4.0

Files

Files (441.4 MB)

Name	Size	Download all
breastmnist.npz md5:750601b1f35ba3300ea97c75c52ff8f6	559.6 kB	Download
chestmnist.npz md5:02c8a6516a18b556561a56cbdd36c4a8	82.8 MB	Download
dermamnist.npz md5:0744692d530f8e62ec473284d019b0c7	19.7 MB	Download
octmnist.npz md5:c68d92d5b585d8d81f7112f81e2d0842	54.9 MB	Download
organmnist_axial.npz md5:866b832ed4eeba67bfb9edee1d5544e6	38.2 MB	Download
organmnist_coronal.npz md5:0afa5834fb105f7705a7d93372119a21	15.5 MB	Download
organmnist_sagittal.npz md5:e5c39f1af030238290b9557d9503af9d	16.5 MB	Download
pathmnist.npz md5:a8b06965200029087d5bd730944a56c1	205.6 MB	Download
pneumoniamnist.npz md5:28209eda62fecd6e6a2d98b1501bb15f	4.2 MB	Download
retinamnist.npz md5:bd4c0672f1bba3e3a89f0e4e876791e4	3.3 MB	Download

Additional details

Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis," arXiv preprint arXiv:2010.14925, 2020.

	All versions	This version
Views	4,813	4,798
Downloads	9,825	9,790
Data volume	911.5 GB	907.3 GB

MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis

Authors/Creators

Description

Files

Files (441.4 MB)

Additional details

References