There is a newer version of the record available.

Published July 11, 2024 | Version v2
Dataset Restricted

Arctique - ARtificial Colon Tissue Images for Qualitative Uncertainty Evaluation

Description

This dataset was introduced and published in the NeurIPS 2024 paper, "Arctique: An Artificial Histopathological Dataset Unifying Realism and Controllability for Uncertainty Quantification". It includes various versions, in particular a version containing 50,000 training images and 1,000 test images, each paired with corresponding instance and semantic masks, as well as 400 additional variations across 50 selected test images to support research on uncertainty quantification.

Versions:

  • Version v3: The version used for experimental results presented in the associated research paper. This dataset features improved realism and includes the core images and labels as in v2, supplemented with noise-augmented variations specifically used to evaluate algorithmic performance under challenging conditions.
  • Version v2: The full dataset, consisting of 50,000 training images and 1,000 test images, along with their associated instance and semantic masks. Additionally, this version includes 400 augmented variations for 50 selected test images.
  • Version v1: A small example subset of the dataset provided for review purposes, containing a limited number of images and annotations to allow preliminary exploration.

Each version of the Arctique dataset is split into training and test sets and variations, each containing the following directories:

The images directory contains all synthetically generated images stored as PNG files. Each image has a resolution of 512x512 pixels with RGB channels and is named "img_<ID>", where <ID> is a unique integer identifier for each image.

The masks directory includes subdirectories containing various masks related to the images:

  • cytoplasm: Contains 2D semantic masks for the cell cytoplasm. Each mask corresponds to an image named "<ID>.tif", where "<ID>" is the identifier for that image. The mask file is named using the same identifier.
  • instance_3d: Contains a directory for each image, named "<ID>. Inside each directory, there is a 3D stack numpy file representing the instance IDs in a 3D volumetric array. Additionally, it includes a sequence of 2D instance segmentation masks, named "slice_<ID>_<slice_count>.png", each representing equidistant slices through the 3D volume along the depth axis.
  • instance: Contains 2D instance masks for the cell nuclei. Each mask corresponds to an image named "<ID>.tif", and the mask file is named with the same identifier.
  • semantic: Contains 2D semantic masks for the cell nuclei. Similar to the instance masks, each mask corresponds to an image named "<ID>.tif", with the mask file named using the same identifier.

Note that all semantic masks appear as black images when viewed with a standard image viewer. This is because the cell type IDs, ranging from 1 to 5, are used as greyscale values, which appear dark in the images. The modelled cell types are:

Cell Types

1 Epithelial Cells 'EPI'
2 Plasma Cells 'PLA'
3 Lymphocytes 'LYM'
4 Eosinophils 'EOS'
5 Fibroblasts 'FIB'

 

The metadata directory contains JSON metadata files named "metadata_<ID>" for each image. Each JSON file includes a list of Python dictionaries, one for each cell object visible in the image. Consider submission appendix F for a detailed explanation of each dictionary.

The parameters directory contains JSON files named "parameters_<ID>", which detail the parameters used to generate each image. Each JSON file is a Python dictionary with all the parameter values necessary to reproduce the scene. Consider submission appendix F for a detailed explanation of each parameter.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.