Histology images from uniform tumor regions in TCGA Whole Slide Images
Description
LICENSE
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC-BY-NC-SA 4.0)
For non-commercial use, please use the dataset under CC-BY-NC-SA.
If you would like to use the dataset for commercial purposes, please contact us (ishum-prm@m.u-tokyo.ac.jp).
Dataset Description
This is a set of 1,608,060 image patches of hematoxylin & eosin stained histological samples of various human cancers.
Whole Slide Images of TCGA dataset from 32 solid cancer types were downloaded from GDC legacy database during December 1, 2016 to June 19, 2017. 9,662 diagnostic slides (the filename contains ’DXn’, where n stands for the slide number) from 7,951 patients in SVS format were then processed to annotate.
For each slide, at least three representative tumor regions were selected as polygons by two trained pathologists using a Web browser-based software developed for this purpose. The pathologists selected uniform tumor regions and avoided the regions with noncancerous structures as much as possible. 926 slides were removed due to poor staining, low resolution, out of focus across a slide, no cancerous regions, or incorrect cancer types. Finally 8,736 diagnostic slides from 7175 patients were remained.
Next, 10 patches with 6 magnification levels from 128 x 128 to 256 x 256 μm were randomly cropped with random angle from each annotated region using keras-OpenSlideGenerator (https://github.com/quolc/keras-OpenSlideGenerator). Each patch was selected so as not to include the region outside the annotated region. The selected region was resized to 256 x 256 pixels. Consequently, the number of patches subjected to the analysis ranged from 264,110 to 271,700.
filename: [cancer_type]/[resolution]/[TCGA Barcode]/[region]-[number]-[pixel resolution in original WSI image].jpg
[resolution]
- 0-> 0.5 μm/pixel
- 1-> 0.6 μm/pixel
- 2-> 0.7 μm/pixel
- 3-> 0.8 μm/pixel
- 4-> 0.9 μm/pixel
- 5-> 1.0 μm/pixel
[TCGA Barcode]
TCGA-XX-XXXX represents patient ID.
Please see https://docs.gdc.cancer.gov/Encyclopedia/pages/TCGA_Barcode/ for detail.
Citation
If you use this dataset for your research, please cite our paper.
Komura, D., Kawabe, A., Fukuta, K., Sano, K., Umezaki, T., Koda, H., Suzuki, R., Tominaga, K., Ochi, M., Konishi, H., Masakado, F., Saito, N., Sato, Y., Onoyama, T., Nishida, S., Furuya, G., Katoh, H., Yamashita, H., Kakimi, K., Seto, Y., Ushiku, T., Fukayama, M., Ishikawa, S., 2022. Universal encoding of pan-cancer histology by deep texture representations. Cell Reports 38, 110424. https://doi.org/10.1016/j.celrep.2022.110424
Files
Adrenocortical_carcinoma.zip
Files
(35.3 GB)
Name | Size | Download all |
---|---|---|
md5:180f5e9b1b5ee138367705b118acfefd
|
672.7 MB | Preview Download |
md5:c31a517074687dc1220c9eef163ce4b2
|
1.4 GB | Preview Download |
md5:de9ebbb3e60a56655aa490f88ed949ed
|
3.2 GB | Preview Download |
md5:94d5694d01fef7444a79041c5b1e51b8
|
3.1 GB | Preview Download |
md5:5918bfb858483b88eae4d50e1bb65d21
|
792.6 MB | Preview Download |
md5:3ec989c8ecd446c7a82ec2f31bbca19f
|
121.2 MB | Preview Download |
md5:c9a0ef460c1caf65f8fd45d16a908d56
|
1.0 GB | Preview Download |
md5:1a2fd729047c5e7ff78290b666abd500
|
426.7 MB | Preview Download |
md5:63f3e13d6b80e5d3ff3e4958845b6844
|
2.8 GB | Preview Download |
md5:f0efc1aa8195eae5d939a091a62622cf
|
1.4 GB | Preview Download |
md5:800643e1e700290585df6328d6c6b003
|
319.1 MB | Preview Download |
md5:49bbf1a0c087e00a734241bebab9fb96
|
1.7 GB | Preview Download |
md5:9fdba7d2764e5a640947a69eb70ab4e0
|
890.5 MB | Preview Download |
md5:704cb8559a5b5ac27cd19011dc39662f
|
314 Bytes | Download |
md5:8ee35b9d8c22f3cdb9ddc4f54aef680d
|
1.2 GB | Preview Download |
md5:66f33d49f267bbf4feaee93f63cb5b8e
|
2.1 GB | Preview Download |
md5:7949f641e1ac03c7554402ff088c48fc
|
2.1 GB | Preview Download |
md5:a6af17438563f33d7b2c6bad21e0618b
|
121.7 MB | Preview Download |
md5:a56c5731fb220064414f488d5fee52cc
|
254.0 MB | Preview Download |
md5:b0294f47f15a8f4359c632fe3f003f59
|
333.9 MB | Preview Download |
md5:9055fd63f0a0cd4fb245ce1b683b77d0
|
487.9 MB | Preview Download |
md5:54d8df72515b76c0b326b19aefb4c719
|
189.6 MB | Preview Download |
md5:48bdeab653e8152b22a919e67e35afd2
|
1.3 GB | Preview Download |
md5:2e685aa760e2e8d48ba66c993c30014b
|
223.8 MB | Preview Download |
md5:df3396c024862257f403d957bbc90f69
|
1.8 GB | Preview Download |
md5:f7d1ea80ebd90ad0796e491389b25d51
|
1.3 GB | Preview Download |
md5:7d0476e3e45ccccb933438d6bd65fcea
|
1.3 GB | Preview Download |
md5:87434d0edc53659be056c41c049629bd
|
808.6 MB | Preview Download |
md5:655a24e785b48ffc2cb79ab287df0192
|
524.9 MB | Preview Download |
md5:fe078a0bbd2d4e3ac4cf1e45bc003e79
|
1.4 GB | Preview Download |
md5:c1d4dffc1f2d50f3d5b3f2a0d2f55460
|
288.8 MB | Preview Download |
md5:684c1d013e500b254f3ba35510c34452
|
1.6 GB | Preview Download |
md5:d1dd6d3ebe5d4eccc6a0a293efe6482b
|
209.7 MB | Preview Download |