CytoDArk0
Creators
Description
Overview
CytoDArk0 is the first annotated dataset of Nissl-stained histological images of the mammalian brain. Annotations define individual cells in each image (38, 755 cells in total, including both neurons and glia cells), allowing to train deep learning models for cell instance segmentation, i.e., for the detection of cells and the accurate delineation of their boundaries. By disseminating this first dataset version, we aim to facilitate progress in digital neuropathology and studies of brain cytoarchitecture [1,2]. The reference paper for CytoDArk0, currently in preprint, can be accessed here and includes a description of the dataset, which is summarized below.
- The dataset comprises a set of 69 1024x1024 non-overlapping image patches at 20x magnification extracted from Nissl-stained histology images of four distinct brain regions (cerebellum, hippocampus, auditory cortex, visual cortex) across five different species (Tursiops Truncatus aka Bottlenose Dolphin, Mouse, Chimpanzee, Macaque, Bovine). This set of images is indicated as CytoDArk0_20x_1024.
- Among the images in CytoDArk0_20x_1024, a subset of 58 images is also available at 40x magnification. We share this subset as a separate set of images with size 2048x2048 called CytoDArk0_40x_2048.
- We also share a patched version of CytoDArk0_20x_1024 and CytoDArk0_40x_2048, which we call CytoDArk0_20x_512 (276 patches at 20x magnification and size 512x512), CytoDArk0_20x_256 (1, 104 patches at 20x magnification and size 256x256), CytoDArk0_40x_1024 (232 patches at 40x magnification and size 1024x1024), CytoDArk0_40x_512 (928 patches at 40x magnification and size 512x512), and CytoDArk0_40x_256 (3, 712 patches at 40x magnification and size 256x256). For example, CytoDArk0_20x_512 is obtained from CytoDArk0_20x_1024 by cutting the image patches in CytoDArk0_20x_1024 into smaller patches.
- The Nissl staining technique was used to stain the entire population of cells in nervous tissue consistently. This technique specifically targets the rough endoplasmic reticulum and ribosomes within cell bodies, producing blue or purple cell bodies against a light or white background. It is widely recognized as the most effective technique for assessing cell body morphology, density, and distribution in the brain, as well as for highlighting the regional or laminar organization of cytoarchitecture across different brain areas [3].
- All images were annotated at the highest available resolution using QuPath [4] to delineate the contours of neurons and glia (38, 755 cells in total). Initial annotations on a few images were made and used in an active learning fashion to train MR-NOM [5] and NCIS [6] to create preliminary annotations on additional images, subsequently reviewed and corrected by our team of experts and used to fine-tune the models. Annotations were exported from QuPath and processed with Matlab and Python to generate the instance segmentation mask along with supplementary maps, such as contour masks and distance masks.
- The reference paper for CytoDArk0 provides a thorough comparison of state-of-the-art cell instance segmentation and classification models. Additionally, we introduce a new deep learning framework, CISCA, designed for automatic cell instance segmentation and classification in histological slices. This framework aims to facilitate detailed morphological and structural analysis and straightforward cell counting in digital pathology workflows and studies of brain cytoarchitecture. The code is available at https://github.com/Vadori/CytoArk.
- Most of the images of CytoDArk0 are from the auditory cortex of the bottlenose dolphin. This is because initial research efforts were concentrated on this species and brain region [7]. While fewer images are available for other brain regions, sound detection and segmentation performances are achievable in these areas. This outcome is expected, given that the data from the auditory cortex include a broad range of brain cell sizes and shapes, and the images from the cerebellum and hippocampus contain a higher cell density, providing a more significant number of cell examples for learning.
- Future versions of CytoDArk0 may include an increased number of annotated images and the annotation of cell types, distinguishing between neurons and glia.
- Please feel free to reach out to us at vadoriv@lsbu.ac.uk for any inquiries.
Folder Structure and Content
The content of cytoDArk0 is organized as shown below. Each image patch in the image folders is assigned a unique ID. These same IDs are used for naming the files in the other folders, ensuring that all related masks and maps for a specific patch share the same name.
cytoDArk0/
│
│── 20x/
│ ├── 256x256/ --> CytoDArk0_20x_256
│ ├── 512x512/ --> CytoDArk0_20x_512
│ └── 1024x1024/ --> CytoDArk0_20x_1024
│ ├── bwmask/: binary bask for foreground/background pixel classification (.png)
│ ├── distmap/: horizontal, vertical, top left diagonal and bottom left diagonal distance maps used for training CISCA (.tiff)
│ ├── graymask4/: mask for four-class pixel classification* (.png)
│ ├── image/: image patch (.png)
│ ├── label/: ground truth instance segmentation label map, where one unique integer number is assigned to each cell instance (.tiff)
│ ├── rgbmask/: mask for alternative three-class pixel classification (cell bodies, boundary between touching cells, background) (.png)
│ └── folds.csv: Lookup table for splitting images into training, validation and test folds according to the split used by CISCA in CytoDArk0 reference paper
│
│── 40x/
│ ├── 256x256/ --> CytoDArk0_40x_256
│ ├── 512x512/ --> CytoDArk0_40x_512
│ ├── 1024x1024/ --> CytoDArk0_40x_1024
│ └──2048x2048/ --> CytoDArk0_40x_2048
│
└── cytoDArk0_metadata.xlsx: File with metadata for each image in CytoDArk0_20x_1024 and CytoDArk0_40x_2048.
*Processed by CISCA for three-class pixel classification (cell bodies, boundary between touching and closely positioned cells, background)
Ethics Statement
CytoDArk0 was created by processing brain samples from the University of Padova, Italy, collected in compliance with legal and ethical standards. In particular:
- Tursiops Truncatus: Brain tissues were sampled from twenty specimens of different subjects (new-born, adult, old) stored in the Mediterranean Marine Mammals Tissue Bank at the University of Padova, a CITES-recognized (IT020) research center and tissue bank. These specimens originated from stranded cetaceans with a decomposition and conservation code (DCC) of 1 and 2, which align with the guidelines for post-mortem investigation of cetaceans.
- Bovine: Brain tissues were sampled from one specimen sourced from nearby commercial abattoirs during the slaughter of cows. The animals were treated in accordance with the European Communities Council directive (86/609/EEC) on animal welfare during the commercial slaughtering process and were continuously monitored under mandatory official veterinary care.
- Chimpanzee and Macaque: Brain tissues were sampled from five specimens. Ethical review and approval were not necessary for the animal study, as the apes were brought to the Department of Comparative Biomedicine and Food Science at the University of Padova for post-mortem examination.
- Mouse: Brain tissues were sampled from four specimens housed in the animal facility of the Department of Surgery, Oncology and Gastroenterology of the University of Padova. All animal studies were approved by the Institutional Animal Care and Use Committee of the Albert Einstein College of Medicine.
Citation
If you find our dataset or CISCA framework useful, we'd love a shoutout! Here’s a citation format you can use:
DATASET (cytoDArk0)
Vadori, V., Graïc, J.-M., Peruffo, A., Vadori, G., Finos, L., & Grisan, E. (2024). CytoDark0 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.13694738
@dataset{vadori2024cytodark0,
author = {Vadori, Valentina and
Graïc, Jean-Marie and
Peruffo, Antonella and
Vadori, Giulia and
Finos, Livio and
Grisan, Enrico},
title = {CytoDArk0},
month = sep,
year = {2024},
publisher = {Zenodo},
doi = {10.5281/zenodo.13694738},
url = {https://doi.org/10.5281/zenodo.13694738}
}
DATASET (cytoDArk0) and/or CELL SEGMENTATION & CLASSIFICATION METHOD (CISCA)
Vadori, V., Graïc, J.-M., Peruffo, A., Vadori, G., Finos, L., & Grisan, E. (2024). CISCA and CytoDArk0: a Cell Instance Segmentation and Classification method for histo(patho)logical image Analyses and a new, open, Nissl-stained dataset for brain cytoarchitecture studies. arXiv preprint arXiv:2409.04175. https://doi.org/10.48550/arXiv.2409.04175
@article{vadori2024cisca,
author = {Vadori, Valentina and
Graïc, Jean-Marie and
Peruffo, Antonella and
Vadori, Giulia and
Finos, Livio and
Grisan, Enrico},
title = {CISCA and CytoDArk0: a Cell Instance Segmentation and Classification method for histo(patho)logical image Analyses and a new, open, Nissl-stained dataset for brain cytoarchitecture studies},
year = {2024},
journal={arXiv e-prints},
pages={arXiv--2409},
}
References
[1] Corain, L., Grisan, E., Graïc, J.M., Carvajal-Schiaffino, R., Cozzi, B. and Peruffo, A., 2020. Multi-aspect testing and ranking inference to quantify dimorphism in the cytoarchitecture of cerebellum of male, female and intersex individuals: a model applied to bovine brains. Brain Structure and Function, 225(9), pp.2669-2688.
[2] Graïc, J.M., Finos, L., Vadori, V., Cozzi, B., Luisetto, R., Gerussi, T., Gatto, M., Doria, A., Grisan, E., Corain, L. and Peruffo, A., 2023. Cytoarchitectureal changes in hippocampal subregions of the NZB/W F1 mouse model of lupus. Brain, Behavior, & Immunity-Health, 32, p.100662.
[3] García-Cabezas, M.Á., John, Y.J., Barbas, H. and Zikopoulos, B., 2016. Distinction of neurons, glia and endothelial cells in the cerebral cortex: an algorithm based on cytological features. Frontiers in neuroanatomy, 10, p.107.
[4] Bankhead, P., Loughrey, M.B., Fernández, J.A., Dombrowski, Y., McArt, D.G., Dunne, P.D., McQuaid, S., Gray, R.T., Murray, L.J., Coleman, H.G. and James, J.A., 2017. QuPath: Open source software for digital pathology image analysis. Scientific reports, 7(1), pp.1-7.
[5] Vadori, V., Graïc, J.M., Finos, L., Corain, L., Peruffo, A. and Grisan, E., 2023, April. MR-NOM: multi-scale resolution of neuronal cells in Nissl-stained histological slices via deliberate over-segmentation and merging. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI) (pp. 1-5). IEEE.
[6] Vadori, V., Peruffo, A., Graïc, J.M., Finos, L., Corain, L. and Grisan, E., 2023, October. NCIS: Deep Color Gradient Maps Regression and Three-Class Pixel Classification for Enhanced Neuronal Cell Instance Segmentation in Nissl-Stained Histological Images. In International Workshop on Machine Learning in Medical Imaging (pp. 457-466). Cham: Springer Nature Switzerland.
[7] Graïc, J.M., Corain, L., Finos, L., Vadori, V., Grisan, E., Gerussi, T., Orekhova, K., Centelleghe, C., Cozzi, B. and Peruffo, A., 2024. Age-related changes in the primary auditory cortex of newborn, adults and aging bottlenose dolphins (Tursiops truncatus) are located in the upper cortical layers. Frontiers in Neuroanatomy, 17, p.1330384.
Files
cytoDArk0.zip
Files
(2.7 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:35ca962f504c9aa2b455958ad94e0b11
|
2.7 GB | Preview Download |
Additional details
Related works
- Is described by
- Preprint: arXiv:2409.04175 (arXiv)
Software
- Repository URL
- https://github.com/Vadori/CytoArk
- Programming language
- Python
- Development Status
- Wip