Published January 19, 2021 | Version 1.0
Dataset Open

Histology images from uniform tumor regions in TCGA Whole Slide Images

  • 1. The University of Tokyo

Description

LICENSE

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC-BY-NC-SA 4.0)

For non-commercial use, please use the dataset under CC-BY-NC-SA.
If you would like to use the dataset for commercial purposes, please contact us (ishum-prm@m.u-tokyo.ac.jp).

Dataset Description

This is a set of 1,608,060 image patches of hematoxylin & eosin stained histological samples of various human cancers.  

Whole Slide Images of TCGA dataset from 32 solid cancer types were downloaded from GDC legacy database during December 1, 2016 to June 19, 2017. 9,662 diagnostic slides (the filename contains ’DXn’, where n stands for the slide number) from 7,951 patients in SVS format were then processed to annotate.

For each slide, at least three representative tumor regions were selected as polygons by two trained pathologists using a Web browser-based software developed for this purpose. The pathologists selected uniform tumor regions and avoided the regions with noncancerous structures as much as possible. 926 slides were removed due to poor staining, low resolution, out of focus across a slide, no cancerous regions, or incorrect cancer types. Finally 8,736 diagnostic slides from 7175 patients were remained. 

Next, 10 patches with 6 magnification levels from 128 x 128 to 256 x 256 μm were randomly cropped with random angle from each annotated region using keras-OpenSlideGenerator (https://github.com/quolc/keras-OpenSlideGenerator). Each patch was selected so as not to include the region outside the annotated region. The selected region was resized to 256 x 256 pixels. Consequently, the number of patches subjected to the analysis ranged from 264,110 to 271,700.

 

filename: [cancer_type]/[resolution]/[TCGA Barcode]/[region]-[number]-[pixel resolution in original WSI image].jpg

 

[resolution]

- 0-> 0.5 μm/pixel

- 1-> 0.6 μm/pixel

- 2-> 0.7 μm/pixel

- 3-> 0.8 μm/pixel

- 4-> 0.9 μm/pixel

- 5-> 1.0 μm/pixel

 

[TCGA Barcode]

TCGA-XX-XXXX represents patient ID.

Please see https://docs.gdc.cancer.gov/Encyclopedia/pages/TCGA_Barcode/ for detail.

 

Citation

If you use this dataset for your research, please cite our paper.

Komura, D., Kawabe, A., Fukuta, K., Sano, K., Umezaki, T., Koda, H., Suzuki, R., Tominaga, K., Ochi, M., Konishi, H., Masakado, F., Saito, N., Sato, Y., Onoyama, T., Nishida, S., Furuya, G., Katoh, H., Yamashita, H., Kakimi, K., Seto, Y., Ushiku, T., Fukayama, M., Ishikawa, S., 2022. Universal encoding of pan-cancer histology by deep texture representations. Cell Reports 38, 110424. https://doi.org/10.1016/j.celrep.2022.110424

 

Files

Adrenocortical_carcinoma.zip

Files (35.3 GB)

Name Size Download all
md5:180f5e9b1b5ee138367705b118acfefd
672.7 MB Preview Download
md5:c31a517074687dc1220c9eef163ce4b2
1.4 GB Preview Download
md5:de9ebbb3e60a56655aa490f88ed949ed
3.2 GB Preview Download
md5:94d5694d01fef7444a79041c5b1e51b8
3.1 GB Preview Download
md5:5918bfb858483b88eae4d50e1bb65d21
792.6 MB Preview Download
md5:3ec989c8ecd446c7a82ec2f31bbca19f
121.2 MB Preview Download
md5:c9a0ef460c1caf65f8fd45d16a908d56
1.0 GB Preview Download
md5:1a2fd729047c5e7ff78290b666abd500
426.7 MB Preview Download
md5:63f3e13d6b80e5d3ff3e4958845b6844
2.8 GB Preview Download
md5:f0efc1aa8195eae5d939a091a62622cf
1.4 GB Preview Download
md5:800643e1e700290585df6328d6c6b003
319.1 MB Preview Download
md5:49bbf1a0c087e00a734241bebab9fb96
1.7 GB Preview Download
md5:9fdba7d2764e5a640947a69eb70ab4e0
890.5 MB Preview Download
md5:704cb8559a5b5ac27cd19011dc39662f
314 Bytes Download
md5:8ee35b9d8c22f3cdb9ddc4f54aef680d
1.2 GB Preview Download
md5:66f33d49f267bbf4feaee93f63cb5b8e
2.1 GB Preview Download
md5:7949f641e1ac03c7554402ff088c48fc
2.1 GB Preview Download
md5:a6af17438563f33d7b2c6bad21e0618b
121.7 MB Preview Download
md5:a56c5731fb220064414f488d5fee52cc
254.0 MB Preview Download
md5:b0294f47f15a8f4359c632fe3f003f59
333.9 MB Preview Download
md5:9055fd63f0a0cd4fb245ce1b683b77d0
487.9 MB Preview Download
md5:54d8df72515b76c0b326b19aefb4c719
189.6 MB Preview Download
md5:48bdeab653e8152b22a919e67e35afd2
1.3 GB Preview Download
md5:2e685aa760e2e8d48ba66c993c30014b
223.8 MB Preview Download
md5:df3396c024862257f403d957bbc90f69
1.8 GB Preview Download
md5:f7d1ea80ebd90ad0796e491389b25d51
1.3 GB Preview Download
md5:7d0476e3e45ccccb933438d6bd65fcea
1.3 GB Preview Download
md5:87434d0edc53659be056c41c049629bd
808.6 MB Preview Download
md5:655a24e785b48ffc2cb79ab287df0192
524.9 MB Preview Download
md5:fe078a0bbd2d4e3ac4cf1e45bc003e79
1.4 GB Preview Download
md5:c1d4dffc1f2d50f3d5b3f2a0d2f55460
288.8 MB Preview Download
md5:684c1d013e500b254f3ba35510c34452
1.6 GB Preview Download
md5:d1dd6d3ebe5d4eccc6a0a293efe6482b
209.7 MB Preview Download