Published February 4, 2022 | Version v2
Dataset Open

Digital Pathology Dataset for Prostate Cancer Diagnosis

  • 1. Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
  • 2. Department of Pathology, Tan Tock Seng Hospital, Singapore, Singapore
  • 3. School of Computing, National University of Singapore, Singapore, Singapore

Description

Links to code and Patterns paper:

1. Multi-lens Neural Machine (MLNM) Code

2. An AI-assisted Tool For Efficient Prostate Cancer Diagnosis in Low-grade and Low-volume Cases

Digitized hematoxylin and eosin (H&E)-stained whole-slide-images (WSIs) of 40 prostatectomy and 59 core needle biopsy specimens were collected from 99 prostate cancer patients at Tan Tock Seng Hospital, Singapore. There were 99 WSIs in total such that each specimen had one WSI. H&E-stained slides were scanned at 40× magnification (specimen-level pixel size 0·25μm × 0·25μm) using Aperio AT2 Slide Scanner (Leica Biosystems). Institutional board review from the hospital were obtained for this study, and all the data were de-identified.

Prostate glandular structures in core needle biopsy slides were manually annotated and classified using the ASAP annotation tool (ASAP). A senior pathologist reviewed 10% of the annotations in each slide, ensuring that some reference annotations were provided to the researcher at different regions of the core. It is to be noted that partial glands appearing at the edges of the biopsy cores were not annotated.

 

Whole Slide Image Dataset

Whole Slide Image dataset containing 99 images in SVS format with corresponding annotations in XML format are provided in WSI.zip.  Available patient grading for the WSIs are provided in 'gleason_score_mapped.txt'.  These XML annotations can be parsed using the code in official repository.

Cropped Image Dataset

Patches of size 512 × 512 pixels were cropped from the WSI (Whole Slide Image Dataset) at resolutions 5×, 10×, 20×, and 40× with an annotated gland centered at each patch. This dataset contains these cropped images.

This dataset is used to train the two AI models for Gland Segmentation (99 patients) and Gland Classification (46 patients). Tables 1 and 2 illustrate both gland segmentation and gland classification datasets. We have put the two corresponding sub-datasets as two zip files as follows:

  1. gland_segmentation_dataset.zip
  2. gland_classification_dataset.zip

Table 1: The number of slides and patches in training, validation, and test sets for gland segmentation task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen.

 

#Slides

 

 

 

 

Train

Valid

Test

Total

Prostatectomy

17

8

15

40

Biopsy

26

13

20

59

Total

43

21

35

99

 

#Patches

 

 

 

 

Train

Valid

Test

Total

Prostatectomy

7795

3753

7224

18772

Biopsy

5559

4028

5981

15568

Total

13354

7781

13205

34340

Table 2: The number of slides and patches in training, validation, and test sets for gland classification task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen. The gland classification datasets are the subsets of the gland segmentation datasets. GS: Gleason Score. B: Benign. M: Malignant.

 

#Slides (GS  3+3:3+4:4+3)

 

 

 

 

Train

Valid

Test

Total

Biopsy

10:9:1

3:7:0

6:10:0

19:26:1

 

#Patches (B:M)

 

 

 

 

Train

Valid

Test

Total

Biopsy

1557:2277

1216:1341

1543:2718

4316:6336

NB: Gland classification folder (gland_classification_dataset.zip) may contain extra patches, labels of which could not be identified from H&E slides. They were not used in the machine learning study.

Notes

This study was funded by the Biomedical Research Council of the Agency for Science, Technology and Research, Singapore.

Files

gland_classification_dataset.zip

Files (89.3 GB)

Name Size Download all
md5:19f5031c6e814dc26e6d0f9be1f89247
27.3 GB Preview Download
md5:6006209dd1df1e4e323ace69ec4fd9e5
19.7 GB Preview Download
md5:d21b1c22e56e6351718c605648f26e5d
1.8 kB Preview Download
md5:b28a50e5f46b5e23e4ca41a59db0d56f
42.3 GB Preview Download