Published March 26, 2024 | Version v1
Dataset Open

CR-AI4SkIN dataset

  • 1. Human-Centred Technology University Research Institute (I3B)
  • 2. ROR icon Universidad de Granada
  • 3. ROR icon Hospital Clínico Universitario de Valencia
  • 4. Hospital San Cecilio de Granada

Description

CR-AI4SkIN is a public dataset composed of H&E patches extracted from Whole Slide Images. This dataset contains Cutaneous Spindle Cell neoplasms, and the task is to classify WSIs into benign or malignant. The main feature of this dataset is that it has been labeled by ten non-expert annotators.

This repository contains the associated files to replicate the study titled "Annotation Protocol and Crowdsourcing Multiple Instance Learning Classification of Skin Histological Images: The CR-AI4SkIN Dataset", published in the Artificial Intelligence in Medicine journal. For further details on the study and the dataset, please see the published article.

  • The zipped file 'annotations.zip' contains three .csv files with crowdsourcing annotations. These files provide the train/val/test split as well as label information. The 'GT' column stands for the expert label, 'MV' for the majority vote among non-experts, and 'Marker_X' is the label given by the X-th annotator. 
  • The zipped file 'img.zip' contains the dataset images divided into two sub-directories indicating the source hospital. Each image is associated with its own folder, which is identified using anonymized IDs. Within each folder, we include the extracted patches representing the predicted regions of interest. Note: We did not include 1 image in the training set and another in the test set because the regions of interest were not found/had troubles. 

Label Dictionary:

 0: Benign
1: Malignant
-1: Missing value (e.g., if an annotator did not label that image). 

Hospital Dictionary:

HCUV: Hospital Clínico Universitario de Valencia (Valencia Hospital).
HUSC: Hospital Universitario San Cecilio (Granada Hospital). 

Citation:

If you use this dataset, please cite the following article:

@article{del2023annotation,
title={Annotation protocol and crowdsourcing multiple instance learning classification of skin histological images: The CR-AI4SkIN dataset},
author={Del Amor, Rocío and Pérez-Cano, Jose and López-Pérez, Miguel and Terradez, Liria and Aneiros-Fernandez, Jose and Morales, Sandra and Mateos, Javier and Molina, Rafael and Naranjo, Valery},
journal={Artificial Intelligence in Medicine},
volume={145},
pages={102686},
year={2023},
publisher={Elsevier} }

Funding:

This work has received funding from the Spanish Ministry of Economy and Competitiveness through project PID2019-105142RB (AI4SKIN) and Spanish Ministry of Science and Innovation through project PID2022-140189OB, from Horizon 2020, the European Union’s Framework Programme for Research and Innovation, under the grant agreement No. 860627 (CLARIFY), grant B-TIC-324-UGR20 funded by Consejería de Universidad, Investigación e Innovación (Junta de Andalucía) and by “ERDF A way of making Europe”, and GVA through the project INNEST/2021/321 (SAMUEL). The work of Rocío del Amor has been supported by the Spanish Ministry of Universities (FPU20/05263). The work of Miguel López Pérez has been supported by the University of Granada postdoctoral program “Contrato Puente”. The work of Sandra Morales has been co-funded by the Universitat Politècnica de València through the program PAID-10-20.

Files

annotations.zip

Files (16.2 GB)

Name Size Download all
md5:e18c167b23e21ae7f4f6db603f3e6487
2.2 kB Preview Download
md5:a2ddda8ea9ca10358d28ad2fde2aab8b
16.2 GB Preview Download

Additional details

Software