Published June 21, 2023 | Version v1
Dataset Open

LyNSeC: Lymphoma Nuclear Segmentation and Classification

  • 1. University of Cologne
  • 2. University Hospital Cologne

Description

Over the last years, there has been large progress in automated segmentation and classification methods in histological whole slide images (WSIs) stained with hematoxylin and eosin (H&E). Current state-of-the-art techniques are based on diverse datasets of H&E-stained WSIs of different types of predominantly solid cancer. However, there is a lack of publicly available annotated datasets of lymphoma, which is why we generated a labeled diffuse large B-cell lymphoma dataset and denoted it LyNSeC (lymphoma nuclear segmentation and classification). LyNSeC comprises three subsets: LyNSeC 1 consists of 379 IHC images of size 512 x 512 pixels at 40x magnification. In the images, we annotated the contours of each cell nuclei and the cell class: marker-positive or marker-negative.

In total, LyNSeC 1 contains 87,316 annotated cell nuclei of four different cases, with 48,171 of them assigned the class negative and 39,145 positive. We included three markers in this dataset showing visually different staining patterns: cluster of differentiation 3 (CD3), Ki67 as a marker of proliferation, and erythroblast transformation-specific (EST)-related gene (ERG).

LyNSeC 2 and 3 contain H&E-stained images of 70 different patients. LyNSeC 2 consists of 280 images and LyNSeC 3 of 40 images of size 512 x 512 pixels at 40x magnification. 65,479 and 8,452 nuclei were annotated in LyNSeC 2 and 3, respectively. In LyNSeC 3, the nuclei were also assigned a class label (tumor and non-tumor). 3,747 nuclei were identified as tumors and 4,705 as non-tumors.

In the annotation procedure, the contours of the H&E images (LyNSeC 2 and LyNSeC 3) were annotated by two pathologists and by two students (trained by the pathologists). Annotation of the cell classes in LyNSeC 3 was done by the pathologists only. LyNSeC 1 was annotated by the two students who were additionally trained to annotate the contours and to distinguish marker-positive and marker-negative cells. The pathologists inspected and (if necessary) adjusted the LyNSeC 3 annotations.

The files are uploaded in '.npy' format. The files of LyNSeC 1 (x_l1.npy) and LyNSeC 3 (x_l3.npy) contain five channels, respectively: the first three are the RGB channels of the images, channel 4 contains the instance maps, and channel 5 the class type maps (for LyNSeC 1 a pixel value of 1 corresponds to the class negative and 2 to the class positive, whereas in LyNSeC 3 1 corresponds to the class non-tumor and 2 to the class tumor). The files of LyNSeC 2 (x_l2.npy) have 4 channels (without the class type map).

Additionally, we also make our HoVer-Net-based pre-trained nuclei segmentation and classification models available (he.tar for H&E images and ihc.tar for IHC images).

Files

lynsec.zip

Files (1.9 GB)

Name Size Download all
md5:ab487b656117e7eb23f005691ace5276
452.5 MB Download
md5:26e6c929ab1ad623ea979721c33665f9
452.5 MB Download
md5:f9b1bae015ccf2af29617930c051b304
970.7 MB Preview Download