Melanoma Histopathology Dataset with Tissue and Nuclei Annotations
Description
Description:
This dataset is designed for development of deep learning models for segmentation of nuclei and tissue in melanoma H&E stained histopathology. Existing nuclei segmentation models that are trained on non-melanoma specific datasets have low performance due to the ability of melanocytes to mimic other cell types, whereas existing melanoma specific models utilize older, sub-optimal techniques. Moreover, these models do not provide tissue annotations necessary for determining the localization of tumor-infiltrating lymphocytes, which may hold value for predictive and prognostic tasks. To address this, we created a melanoma specific dataset with nuclei and tissue annotations.
Methodology:
Sample Collection:
Regions of interest (ROIs) were sampled from H&E stained slides of 103 primary melanoma specimens and 103 metastatic melanoma specimens, scanned using a Hamamatsu scanner at 40× magnification (0.23 μm per pixel). All slides were obtained from regular diagnostic procedures.
From each specimen, a 40× magnified ROI of 1024×1024 pixels was selected for annotation. Additionally, a context ROI of 5120×5120 pixels was sampled to provide information about the broader context for the annotation process. Selection was performed by a trained medical expert (M.S.) and subsequently verified by a dermatopathologist (W.B.). Manual ROI selection ensured the inclusion of diverse tissue and nuclei types.
Annotation Process:
- Nuclei segmentation
Nuclei segmentations were generated using Hover-Net pretrained on the PanNuke dataset. Manual annotation adjustments were performed by author M.S. using QuPath, with the following nuclei categories: tumor, stroma, vascular endothelium, histiocyte, melanophage, lymphocyte, plasma cell, neutrophil, apoptotic cell, and epithelium. All annotations were reviewed and corrected, where needed, by a dermatopathologist (W.B.). - Tissue segmentation
Tissue segmentations were created manually using QuPath by M.S., with the following categories: tumor, stroma, epidermis, necrosis, blood vessel, and background. Annotations were reviewed and corrected, where needed, by a dermatopathologist (W.B.).
Quality Control:
To assess the reliability of the annotations, intra- and interobserver agreement (by pathologist G.B.) were determined on 12 randomly selected ROIs.
- Nuclei segmentation
The intraobserver overall precision was 84.89%, with a recall of 86.45%, and an F1 score of 85.66%. Interobserver overall precision was 80.34%, with a recall of 80.62%, and an F1 score of 80.20%. These results are based on the sum of all true positive, false positive, and false negative counts for the 12 ROIs. - Tissue segmentation
The DICE score was determined on the same 12 randomly selected ROIs. The average intraobserver DICE score was 0.90, and the interobserver DICE score was also 0.90.
Version 2:
Updated the file format to .tiff
with magnification metadata included in the files.
Added 5 additional ROIs containing necrosis and epidermis tissue categories.