There is a newer version of the record available.

Published May 23, 2025 | Version v2
Dataset Open

BoneMarrowWSI-PediatricLeukemia: A Comprehensive Dataset of Bone Marrow Aspirate Smear Whole Slide Images with Expert Annotations and Clinical Data in Pediatric Leukemia

  • 1. Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany
  • 2. ROR icon Universitätsklinikum Erlangen
  • 3. ROR icon Friedrich-Alexander-Universität Erlangen-Nürnberg
  • 4. Medical Informatics, Friedrich-Alexander University of Erlangen-Nürnberg, Erlangen, Germany
  • 5. PixelMed Publishing
  • 6. Brigham and Women's Hospital Department of Radiology
  • 7. ROR icon Fraunhofer Institute for Digital Medicine
  • 8. Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Erlangen, Germany

Description

The dataset comprises bone marrow aspirate smear WSI for 257 pediatric cases of leukemia, including acute lymphoid leukemia (ALL), acute myeloid leukemia (AML), and chronic myeloid leukemia (CML). The smears were prepared for the initial diagnosis (i.e., without prior treatment), stained in accordance with the Pappenheim method, and scanned at 40x magnification.

The images have been annotated with rectangular regions of interest (ROI) within the evaluable monolayer area, and a total of 47176 cell bounding box annotations have been placed within the regions of interest. Cells have been annotated by multiple experts in a consensus labeling approach with 49 distinct cell type classes. This consensus approach entailed that each cell was sequentially annotated by multiple individuals until each cell had been labeled by at least two individuals, and the majority class was assigned in at least half of all annotations for that image. The labels from all annotation sessions, as well as the final consensus class for each cell, are made available. 

Additionally, clinical information (age group, sex, diagnosis) and laboratory data (blasts, white blood cell count, thrombocytes, LDH, uric acid, hemoglobin) are available for each case.

Files included

Pending peer review of an accompanying manuscript, currently, this dataset contains a sample of 2 bone marrow aspirate smear whole slide images (WSIs) with their cell annotations as a first sample of the dataset described above. 

The entire dataset will be available in National Cancer Institute Imaging Data Commons (https://imaging.datacommons.cancer.gov). If you have any questions about the dataset please contact IDC support at support@canceridc.dev.

Both images and annotations are in DICOM format. All DICOM objects relating to the same smear are contained in the same folder. Clinical data are contained in the DICOM metadata. 

In addition lab_values_sample.csv contains the collected lab values for those two smears. 

The attached files are named using the following convention using the corresponding DICOM tags: %PatientID-%Modality-%SeriesDescription-%SOPInstanceUID.dcm.

For example, files corresponding to the patient  A6BBC91AE73DD21C0533F735470A9CD0 contains the following 6 DICOM Slide Microscopy (SM) modality files each representing one level of the WSI pyramid.

  1. A6BBC91AE73DD21C0533F735470A9CD0-SM-Bone marrow aspirate smear, May-Gruenwald-Giemsa stain-1.2.826.0.1.3680043.8.498.26060080718466278522952527845683544045.dcm
  2. A6BBC91AE73DD21C0533F735470A9CD0-SM-Bone marrow aspirate smear, May-Gruenwald-Giemsa stain-1.2.826.0.1.3680043.8.498.62030007770863397357636084490828160953.dcm
  3. A6BBC91AE73DD21C0533F735470A9CD0-SM-Bone marrow aspirate smear, May-Gruenwald-Giemsa stain-1.2.826.0.1.3680043.8.498.7859053050060184362011899525686475413.dcm
  4. A6BBC91AE73DD21C0533F735470A9CD0-SM-Bone marrow aspirate smear, May-Gruenwald-Giemsa stain-1.2.826.0.1.3680043.8.498.85089919641169806925347867181900526802.dcm
  5. A6BBC91AE73DD21C0533F735470A9CD0-SM-Bone marrow aspirate smear, May-Gruenwald-Giemsa stain-1.2.826.0.1.3680043.8.498.88236263312726593722497600529137414206.dcm
  6. A6BBC91AE73DD21C0533F735470A9CD0-SM-Bone marrow aspirate smear, May-Gruenwald-Giemsa stain-1.2.826.0.1.3680043.8.498.95738688685525699076567938918194597802.dcm

Cell annotations with labels from each annotation session in the labeling process are stored in DICOM Bulk Annotations (ANN modality) objects: 

  1. A6BBC91AE73DD21C0533F735470A9CD0-ANN-Cell bounding boxes with cell type labels; annotation session: 0-1.2.826.0.1.3680043.10.511.3.12557519480564734942303269163896694.dcm
  2. A6BBC91AE73DD21C0533F735470A9CD0-ANN-Cell bounding boxes with cell type labels; annotation session: 1-1.2.826.0.1.3680043.10.511.3.6987603211883801558525207593845155.dcm
  3. A6BBC91AE73DD21C0533F735470A9CD0-ANN-Cell bounding boxes with cell type labels; annotation session: 2-1.2.826.0.1.3680043.10.511.3.1120336965786739278582883135803528.dcm
  4. A6BBC91AE73DD21C0533F735470A9CD0-ANN-Cell bounding boxes with cell type labels; annotation session: 3-1.2.826.0.1.3680043.10.511.3.6408695988615311222439105226576101.dcm
  5. A6BBC91AE73DD21C0533F735470A9CD0-ANN-Cell bounding boxes with cell type labels; annotation session: 4-1.2.826.0.1.3680043.10.511.3.52627683252818668316930590707706798.dcm
  6. A6BBC91AE73DD21C0533F735470A9CD0-ANN-Cell bounding boxes with consensus cell type labels-1.2.826.0.1.3680043.10.511.3.1666264618985716614248499039136585.dcm
  7. A6BBC91AE73DD21C0533F735470A9CD0-ANN-Monolayer regions of interest for cell classification-1.2.826.0.1.3680043.10.511.3.6350792333250462425535421489809492.dcm

Acknowledgments

The authors thank Stefanie Barnickel, Nathalie Dollmann, Tatjana Flamann, Meinolf Suttorp, and Perdita Weller for the labelling of the cells.

The authors thank the following institutions for supplying BMA smears: University Hospital Augsburg (Univ.-Prof. Dr. Dr. med. Michael Frühwald), Charité Berlin - ALL-REZ BFM Study Group (PD Dr. med. Arend von Stackelberg), University Hospital at the TU Dresden (Prof. Dr. med. Meinolf Suttorp), University Hospital Essen - AML-BFM Study Group (Prof. Dr. Dirk Reinhardt), Technical University of Munich (Prof. Dr. med. Irene Teichert-von Lüttichau), University Hospital Würzburg (Prof. Dr. med. Matthias Eyrich).

This study was supported by a grant from the German Federal Ministry of Education and Research (FKZ: 031L0262A; BMDeep)

Preparation of the Dataset for publication was partly supported by Federal funds from the National Cancer Institute, National Institutes of Health (Task Order No. HHSN26110071 under Contract HHSN261201500003l).

Files

lab_values_sample.csv

Files (13.3 GB)

Name Size Download all
md5:c8ec67289ca84bdb7f64eff6a654b930
44.9 kB Download
md5:3fd7e7feda6649287660d6fb8601669b
44.2 kB Download
md5:24839b95634043990786f4ea885575eb
20.6 kB Download
md5:f944e08b18048559147f9dcb15719e69
15.6 kB Download
md5:ac577457a191bd56bb6afcd8ff27dd6e
11.9 kB Download
md5:e985304acee8ca1940a29d71172f6d91
44.9 kB Download
md5:c58d855c9892a311fb37f9bc6c5862d2
5.8 kB Download
md5:c02d73ef74e3ce798b0db431bc5dfcd9
169.8 kB Download
md5:39bab31eae69435945506c87bb99dd3b
5.4 GB Download
md5:3f002f98201a3472f01cc82be39eb736
1.8 MB Download
md5:59b893d27212522ad335c486007a68f5
321.1 MB Download
md5:bf1a34e4a1e5a04f3f0569d647a3f139
29.7 MB Download
md5:a2b9e77b4efd576043ae7592030caf4d
84.6 kB Download
md5:e02c72888c202aeec09db5bf69f11729
174.1 kB Download
md5:ce33b380d65d24142d173e6650fa18db
35.8 kB Download
md5:6b042475ce193034c0784615488f2680
38.1 kB Download
md5:a3f3ed991d78442057e519916a271e98
18.1 kB Download
md5:99a5ca8153e35aa7b303b0e53c3a57f5
15.6 kB Download
md5:037d2fb0cc299fb9af9c9b8ea9c1c8fc
14.8 kB Download
md5:778cf0fee312880f4239041532b7841e
36.6 kB Download
md5:f185d896ec0161c1dea8baf73cabe0b2
5.8 kB Download
md5:be8e7583c302c0047190cc699db03a60
112.8 kB Download
md5:2d1378eaa618b5bb99cb48eac301f69d
260.9 kB Download
md5:56508f02f2772145f0c13d961b8c597b
276.2 kB Download
md5:fdccb52878e6b3a5698d215816c49381
6.8 GB Download
md5:4cb7280d25d29626229e09355ae567d3
4.1 MB Download
md5:90d0b071797b009e68d31239a7305694
621.5 MB Download
md5:30a9e9632b4bcadfd4886e2cba143a67
60.6 MB Download
md5:d78afa2fe9b7914e2dbf9d03deae1752
351 Bytes Preview Download