Dataset Open Access

100,000 histological images of human colorectal cancer and healthy tissue

Kather, Jakob Nikolas; Halama, Niels; Marx, Alexander


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">colorectal cancer</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">histopathology</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">histology</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">digital pathology</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">image classification</subfield>
  </datafield>
  <controlfield tag="005">20191101071240.0</controlfield>
  <controlfield tag="001">1214456</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">National Center for Tumor Diseases, Heidelberg</subfield>
    <subfield code="a">Halama, Niels</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Institute of Pathology, University Medical Center Mannheim, Mannheim, Germany</subfield>
    <subfield code="a">Marx, Alexander</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">800276929</subfield>
    <subfield code="z">md5:2fd1651b4f94ebd818ebf90ad2b6ce06</subfield>
    <subfield code="u">https://zenodo.org/record/1214456/files/CRC-VAL-HE-7K.zip</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">11726652899</subfield>
    <subfield code="z">md5:035777cf327776a71a05c95da6d6325f</subfield>
    <subfield code="u">https://zenodo.org/record/1214456/files/NCT-CRC-HE-100K-NONORM.zip</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">11690284003</subfield>
    <subfield code="z">md5:6fd702d11df6292bc054397ae038a464</subfield>
    <subfield code="u">https://zenodo.org/record/1214456/files/NCT-CRC-HE-100K.zip</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2018-04-07</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="o">oai:zenodo.org:1214456</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">National Center for Tumor Diseases, Heidelberg</subfield>
    <subfield code="0">(orcid)0000-0002-3730-5348</subfield>
    <subfield code="a">Kather, Jakob Nikolas</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">100,000 histological images of human colorectal cancer and healthy tissue</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">http://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;&lt;strong&gt;Data Description &amp;quot;NCT-CRC-HE-100K&amp;quot;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;This is a set of 100,000 non-overlapping image patches from hematoxylin &amp;amp; eosin (H&amp;amp;E) stained histological images of human colorectal cancer (CRC) and normal tissue.&lt;/li&gt;
	&lt;li&gt;All images are 224x224 pixels (px) at 0.5 microns per pixel (MPP). All images are color-normalized using Macenko&amp;#39;s method (http://ieeexplore.ieee.org/abstract/document/5193250/, DOI &lt;a href="https://doi.org/10.1109/ISBI.2009.5193250"&gt;10.1109/ISBI.2009.5193250&lt;/a&gt;).&lt;/li&gt;
	&lt;li&gt;Tissue classes are: Adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colon mucosa (NORM), cancer-associated stroma (STR), colorectal adenocarcinoma epithelium (TUM).&lt;/li&gt;
	&lt;li&gt;These images were manually extracted from N=86 H&amp;amp;E stained human cancer tissue slides from formalin-fixed paraffin-embedded (FFPE) samples from the NCT Biobank (National Center for Tumor Diseases, Heidelberg, Germany) and the UMM pathology archive (University Medical Center Mannheim, Mannheim, Germany). Tissue samples contained CRC primary tumor slides and tumor tissue from CRC liver metastases; normal tissue classes were augmented with non-tumorous regions from gastrectomy specimen to increase variability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Ethics statement &amp;quot;NCT-CRC-HE-100K&amp;quot;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All experiments were conducted in accordance with the Declaration of Helsinki, the International Ethical Guidelines for Biomedical Research Involving Human Subjects (CIOMS), the Belmont Report and the U.S. Common Rule. Anonymized archival tissue samples were retrieved from the tissue bank of the National Center for Tumor diseases (NCT, Heidelberg, Germany) in accordance with the regulations of the tissue bank and the approval of the ethics committee of Heidelberg University (tissue bank decision numbers 2152 and 2154, granted to Niels Halama and Jakob Nikolas Kather; informed consent was obtained from all patients as part of the NCT tissue bank protocol, ethics board approval S-207/2005, renewed on 20 Dec 2017). Another set of tissue samples was provided by the pathology archive at UMM (University Medical Center Mannheim, Heidelberg University, Mannheim, Germany) after approval by the institutional ethics board (Ethics Board II at University Medical Center Mannheim, decision number 2017-806R-MA, granted to Alexander Marx and waiving the need for informed consent for this retrospective and fully anonymized analysis of archival samples).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data set &amp;quot;CRC-VAL-HE-7K&amp;quot;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is a set of 7180 image patches from N=50 patients with colorectal adenocarcinoma (no overlap with patients in NCT-CRC-HE-100K). It can be used as a validation set for models trained on the larger data set. Like in the larger data set, images are 224x224 px at 0.5 MPP. All tissue samples were provided by the NCT tissue bank, see above for further details and ethics statement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data set &amp;quot;NCT-CRC-HE-100K-NONORM&amp;quot;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is a slightly different version of the &amp;quot;NCT-CRC-HE-100K&amp;quot; image set: This set contains 100,000 images in 9 tissue classes at 0.5 MPP and was created from the same raw data as &amp;quot;NCT-CRC-HE-100K&amp;quot;. However, no color normalization was applied to these images. Consequently, staining intensity and color slightly varies between the images. Please note that although this image set was created from the same data as &amp;quot;NCT-CRC-HE-100K&amp;quot;, the image regions are not completely identical because the selection of non-overlapping tiles from raw images was a stochastic process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;General comments&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Please note that the classes are only roughly balanced. Classifiers should never be evaluated based on accuracy in the full set alone. Also, if a high risk of training bias is excepted, balancing the number of cases per class is recommended.&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.1214455</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.1214456</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>
3,411
5,867
views
downloads
All versions This version
Views 3,4113,420
Downloads 5,8675,867
Data volume 60.1 TB60.1 TB
Unique views 2,9252,933
Unique downloads 1,6661,666

Share

Cite as