Dataset Open Access

Histopathology data of bone marrow biopsies (HistBMP or HistMNIST)

Jakub Tomczak

Jakub Tomczak

Data information

We prepared a dataset basing on histopathological images freely available on-line ( We selected 16 patients (patient IDs: 272, 274, 283, 289, 290, 291, 292, 295, 297, 298, 299). Each histopathological image represents a bone marrow biopsy. Diagnoses of the chosen cases were associated with different kinds of cancer (e.g., lymphoma, leukemia) or anemia. All original images were taken using HE, 40×, and each image was of size 336 × 448.

Data preparation

The original RGB representation was transformed to gray scale. Further, we divided each image into small patches of size 28 × 28. Eventually, we picked 10 patients for training, 3 patients for validation and 3 patients for testing, which resulted in 6,800 training images, 2,000 validation images and 2,000 test images. The selection of patients was performed in such a fashion that each dataset contained representative images with different diagnoses and amount of fat.

Since the small patches resemble a widely-used benchmark in machine learning/AI community called MNIST, the dataset is referred to as HistMNIST. 

First usage

The dataset was used to train deep generative models (VAEs):

  • Tomczak, J. M., & Welling, M. (2016). Improving variational auto-encoders using householder flow. arXiv preprint arXiv:1611.09630.

The dataset was originally used in the following paper: J.M. Tomczak & M. Welling, "Improving Variational Auto-Encoders using Householder Flow", NIPS Workshop on Bayesian Deep Learning 2016, arXiv:1611.09630
Files (59.1 MB)
Name Size
59.1 MB Download
  • Tomczak, J. M., & Welling, M. (2016). Improving variational auto-encoders using householder flow. arXiv preprint arXiv:1611.09630.

All versions This version
Views 737738
Downloads 142142
Data volume 8.4 GB8.4 GB
Unique views 674675
Unique downloads 126126


Cite as