Labeled Images for Ulcerative Colitis (LIMUC) Dataset

Gorkem Polat; Haluk Tarik Kani; Ilkay Ergenc; Yesim Ozen Alahdab; Alptekin Temizel; Ozlen Atug

doi:10.5281/zenodo.5827695

Published March 14, 2022 | Version 1

Dataset Open

Labeled Images for Ulcerative Colitis (LIMUC) Dataset

1. Middle East Technical University
2. Marmara University

Dataset Details

The LIMUC dataset compromises 11276 images from 564 patients and 1043 colonoscopy procedures, who underwent colonoscopy for ulcerative colitis between December 2011 and July 2019 at the Department of Gastroenterology in Marmara University School of Medicine. Two experienced gastroenterologists blindly reviewed and classified all images according to the Mayo endoscopic score (MES). Images that were differently labeled by two reviewers were also labeled by a third experienced reviewer independently without seeing their previous labels. The final MES for differently labeled images was determined using majority voting.

Mayo 0: 6105 (54.14%)
Mayo 1: 3052 (27.70%)
Mayo 2: 1254 (11.12%)
Mayo 3: 865 (7.67%)

patient_based_classified_images: Images of each patient are separated according to Mayo classes. If a train-val-test splitting is to be made according to the ratios desired by the user, this folder should be used.

train_and_validation_sets: Train and validation sets used in the research paper. Using the scripts in dataset's GitHub repository, same 10-fold can be generated for replicating the results.

test_set: Test set used for performance measurement in the research paper. For a fair performance comparisons, this should be used to report performances.

Suggested Metrics

Since there are imbalances and ordinality among classes (Mayo-0, Mayo-1, Mayo-2, Mayo-3), quadratic weighted kappa (QWK) can be used as the main performance metric. The QWK is one of the commonly used statistics for the assessment of agreement on an ordinal scale and it is one of the best singular performance metrics for this problem regarding class imbalances. Mean absolute error (MAE), macro F1 score, or macro accuracy can be used as alternative performance metrics.

LIMUC Code Repository

Many scripts for preprocessing, splitting, training, and validating the dataset are provided in this GitHub repository.

Terms and Conditions

The LIMUC dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This license permits unrestricted use, distribution, and reproduction in any medium, provided that proper attribution is given to the original creators. This ensures that the dataset can be used for both research and commercial applications while maintaining transparency and acknowledgment of the contributors.

For more details about the license, please refer to: Creative Commons Attribution 4.0 International License.

Regarding the questions, please contact polatgorkem@gmail.com.

Files

patient_based_classified_images.zip

Files (3.8 GB)

Name	Size	Download all
patient_based_classified_images.zip md5:6d99c6b4f9b9bf3e210964d16439de42	1.9 GB	Preview Download
test_set.zip md5:bdd42cff6307ce5d2ebcc3faf31c2d2a	286.8 MB	Preview Download
train_and_validation_sets.zip md5:b5b044affaccfdc9857aa1cc3072f7bc	1.6 GB	Preview Download

	All versions	This version
Views	5,635	5,541
Downloads	2,244	2,238
Data volume	6.3 TB	6.2 TB

Labeled Images for Ulcerative Colitis (LIMUC) Dataset

Creators

Description

Dataset Details

Suggested Metrics

LIMUC Code Repository

Terms and Conditions

Files

patient_based_classified_images.zip

Files (3.8 GB)