PathoNet
Description
PathoNet is a general purpose dataset for digital pathology. It consists of 4,462,156 jpg images divided into 12 classes (tissues).
These images were extracted from TCGA (The Cancer Genome Atlas) data portal. No annotations were made, only the tissue type was taken from the slides metadata.
For each tissue, 400,000 256x256 pixel images were randomly selected and downloaded from 400 WSI. An automated cleaning process was then performed to eliminate cases with excessive white content and blurred images.
The dataset is already divided into Train, Test and Validation. When dividing the data, cases were taken into account to avoid mixing images from the same case in different partitions, i.e. all images corresponding to a particular case are in the same partition.
The final number of images for each class and partition are:
| Tissue | Partition | # of images |
|---|---|---|
| Bladder | Train | 308.677 |
| Bladder | Validation | 38.927 |
| Bladder | Test | 39.166 |
| Brain | Train | 313.890 |
| Brain | Validation | 39.665 |
| Brain | Test | 39.613 |
| Breast | Train | 303.949 |
| Breast | Validation | 37.499 |
| Breast | Test | 38.602 |
| Bronchus and lung | Train | 308.848 |
| Bronchus and lung | Validation | 37.730 |
| Bronchus and lung | Test | 39.160 |
| Colon | Train | 243.330 |
| Colon | Validation | 30.220 |
| Colon | Test | 32.135 |
| Corpus uteri | Train | 312.743 |
| Corpus uteri | Validation | 39.549 |
| Corpus uteri | Test | 39.184 |
| Kidney | Train | 311.005 |
| Kidney | Validation | 37.950 |
| Kidney | Test | 39.184 |
| Liver and intrahepatic bile ducts | Train | 314.707 |
| Liver and intrahepatic bile ducts | Validation | 38.689 |
| Liver and intrahepatic bile ducts | Test | 39.799 |
| Prostate gland | Train | 296.181 |
| Prostate gland | Validation | 36.568 |
| Prostate gland | Test | 36.376 |
| Skin | Train | 307.308 |
| Skin | Validation | 37.411 |
| Skin | Test | 38.487 |
| Stomach | Train | 295.002 |
| Stomach | Validation | 37.559 |
| Stomach | Test | 36.112 |
| Thyroid gland | Train | 258.415 |
| Thyroid gland | Validation | 33.667 |
| Thyroid gland | Test | 33.849 |
For convenience, the training data has been uploaded by class.
Files
test.zip
Files
(131.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:5d95905b68cc96a8d608473800e8eae3
|
13.3 GB | Preview Download |
|
md5:6bbf3c009d6bde76a1d2b6cf87d9912e
|
9.1 GB | Preview Download |
|
md5:03dc3eedd818f79c63c8b3c8e8d06294
|
9.8 GB | Preview Download |
|
md5:425f509339a381b0e0b361c4935dc73a
|
8.6 GB | Preview Download |
|
md5:087683ceb6b727da526bc06c13e6f7dd
|
9.3 GB | Preview Download |
|
md5:8bb0a35af134bc0b32994e3d7cca1817
|
6.5 GB | Preview Download |
|
md5:395fc5af81f9552d328e166330cb0faa
|
9.2 GB | Preview Download |
|
md5:1062938e35c154587c7affa74e6e65bf
|
9.4 GB | Preview Download |
|
md5:0774db60a63da2436cb97ad6472a90f7
|
9.7 GB | Preview Download |
|
md5:3bb9af2a3ca25cad7e91cbff24468721
|
8.5 GB | Preview Download |
|
md5:ae229e5b8e7fec0153893fdc378795a3
|
9.2 GB | Preview Download |
|
md5:96968c2191a94673cd01d0b10aef48df
|
8.8 GB | Preview Download |
|
md5:c333cbd45f47375d0c91401e7a148e22
|
6.9 GB | Preview Download |
|
md5:ba6f6c4cedb397db3df7c8fdfe48dab6
|
13.2 GB | Preview Download |