CPAISD: Core-Penumbra Acute Ischemic Stroke Dataset

Umerenkov, Dmitriy; Kudin, Stepan; Peksheva, Marina; Pavlov, Denis

doi:10.5281/zenodo.10892316

Published March 28, 2024 | Version v1

Dataset Open

CPAISD: Core-Penumbra Acute Ischemic Stroke Dataset

1. Sber AI Lab
2. City hospital 40 of the Saint Petersburg Resort district

The dataset contains 112 non-contrast cranial CT scans of patients with hyperacute stroke, featuring delineated zones of penumbra and core of the stroke on each slice where present. The data in the dataset are anonymized using the Kitware DicomAnonymizer, with standard anonymization settings, except for preserving the values of the following fields:

(0x0010, 0x0040) – Patient's Sex
(0x0010, 0x1010) – Patient's Age
(0x0008, 0x0070) – Manufacturer
(0x0008, 0x1090) – Manufacturer’s Model Name

The patient's sex and age are retained for demographic analysis of the samples, and the equipment manufacturer and model are kept for dataset statistics and the potential for domain shift analysis.

The dataset is split into three folds:

Training fold (92 studies, 8,376 slices).
Validation fold (10 studies, 980 slices).
Testing fold (10 studies, 809 slices).

The dataset has the following structure:

metadata.json – dataset metadata
summary.csv – metadata of each study in a CSV format table
Part of the dataset (train, val, and test)
- Study
  - Slice
    - raw.dcm – original slice file
    - image.npz – slice in Numpy array format
    - mask.npz – segmentation mask in Numpy array format
    - metadata.json – slice metadata in JSON format
  - metadata.json – study metadata in JSON format

The metadata.json at the root of the dataset has the following format:

generation_params – dataset generation parameters:
- test_size – proportion of the test part
- val_size – proportion of the validation part
stats – statistical data:
- common – general statistical data:
  - train_size_in_studies – number of studies in the training part of the dataset.
  - train_size_in_images – number of slices in the training part of the dataset.
  - val_size_in_studies – number of studies in the validation part of the dataset.
  - val_size_in_images – number of slices in the validation part of the dataset.
  - test_size_in_studies – number of studies in the test part of the dataset.
  - test_size_in_images – number of slices in the test part of the dataset.
- train – statistical data for the training part of the dataset:
  - min – minimum pixel value.
  - max – maximum pixel value.
  - mean – average pixel value.
  - std – standard deviation for all pixel values.

The metadata.json at the root of the study has the following format, if a field value is unknown, it is given as 'unknown':

manufacturer – manufacturer of the tomograph.
model – model of the tomograph.
device – full name of the tomograph (manufacturer + model).
age – patient's age in years.
sex – patient's sex. M – male, F – female.
dsa – whether cerebral angiography was performed. true if yes, false if no.
nihss – NIHSS score.
time – time in hours from the onset of the stroke to the conduct of the study. Can be either a number or a range.
lethality – whether the person died as a result of this stroke. true if yes, false if no.

The summary.csv contains the same fields as the `metadata.json` from the root of the study, plus two additional fields:

name – name of the study.
part – part of the dataset in which the study is located.

Files

dataset.zip

Files (5.6 GB)

Name	Size	Download all
dataset.zip md5:a034be2cc6e93b5fb696231c59df900c	5.6 GB	Preview Download

Additional details

Repository URL: https://github.com/sb-ai-lab/early_hyperacute_stroke_dataset
Programming language: Python
Development Status: Active

	All versions	This version
Views	3,504	3,504
Downloads	1,452	1,452
Data volume	16.3 TB	16.3 TB

CPAISD: Core-Penumbra Acute Ischemic Stroke Dataset

Authors/Creators

Description

Files

dataset.zip

Files (5.6 GB)

Additional details

Software