Published July 5, 2023 | Version 1.0.0
Dataset Open

PathoNet

Description

PathoNet is a general purpose dataset for digital pathology. It consists of 4,462,156 jpg images divided into 12 classes (tissues).

These images were extracted from TCGA (The Cancer Genome Atlas) data portal. No annotations were made, only the tissue type was taken from the slides metadata.

For each tissue, 400,000 256x256 pixel images were randomly selected and downloaded from 400 WSI. An automated cleaning process was then performed to eliminate cases with excessive white content and blurred images.

The dataset is already divided into Train, Test and Validation. When dividing the data, cases were taken into account to avoid mixing images from the same case in different partitions, i.e. all images corresponding to a particular case are in the same partition.

The final number of images for each class and partition are:

Tissue Partition # of images
Bladder Train 308.677
Bladder Validation 38.927
Bladder Test 39.166
Brain Train 313.890
Brain Validation 39.665
Brain Test 39.613
Breast Train 303.949
Breast Validation 37.499
Breast Test 38.602
Bronchus and lung Train 308.848
Bronchus and lung Validation 37.730
Bronchus and lung Test 39.160
Colon Train 243.330
Colon Validation 30.220
Colon Test 32.135
Corpus uteri Train 312.743
Corpus uteri Validation 39.549
Corpus uteri Test 39.184
Kidney Train 311.005
Kidney Validation 37.950
Kidney Test 39.184
Liver and intrahepatic bile ducts Train 314.707
Liver and intrahepatic bile ducts Validation 38.689
Liver and intrahepatic bile ducts Test 39.799
Prostate gland Train 296.181
Prostate gland Validation 36.568
Prostate gland Test 36.376
Skin Train 307.308
Skin Validation 37.411
Skin Test 38.487
Stomach Train 295.002
Stomach Validation 37.559
Stomach Test 36.112
Thyroid gland Train 258.415
Thyroid gland Validation 33.667
Thyroid gland Test 33.849

For convenience, the training data has been uploaded by class.

Files

test.zip

Files (131.6 GB)

Name Size Download all
md5:5d95905b68cc96a8d608473800e8eae3
13.3 GB Preview Download
md5:6bbf3c009d6bde76a1d2b6cf87d9912e
9.1 GB Preview Download
md5:03dc3eedd818f79c63c8b3c8e8d06294
9.8 GB Preview Download
md5:425f509339a381b0e0b361c4935dc73a
8.6 GB Preview Download
md5:087683ceb6b727da526bc06c13e6f7dd
9.3 GB Preview Download
md5:8bb0a35af134bc0b32994e3d7cca1817
6.5 GB Preview Download
md5:395fc5af81f9552d328e166330cb0faa
9.2 GB Preview Download
md5:1062938e35c154587c7affa74e6e65bf
9.4 GB Preview Download
md5:0774db60a63da2436cb97ad6472a90f7
9.7 GB Preview Download
md5:3bb9af2a3ca25cad7e91cbff24468721
8.5 GB Preview Download
md5:ae229e5b8e7fec0153893fdc378795a3
9.2 GB Preview Download
md5:96968c2191a94673cd01d0b10aef48df
8.8 GB Preview Download
md5:c333cbd45f47375d0c91401e7a148e22
6.9 GB Preview Download
md5:ba6f6c4cedb397db3df7c8fdfe48dab6
13.2 GB Preview Download