UNIVALI Leukocyte Dataset: 14-Class Peripheral Blood Cell Images for Deep Learning
Authors/Creators
Description
Dataset Overview
This dataset contains images of peripheral blood smear cells, annotated for the classification of 14 distinct hematological categories, including mature leukocytes, immature granulocytes (blasts, promyelocytes, myelocytes, metamyelocytes), and artifacts. The annotation and dataset management were performed using the Roboflow platform.
It is designed for training Deep Learning models, specifically formatted for YOLO (txt annotations) and Vision Transformers (classification structure).
Content & Pre-processing
The original dataset (4,471 images) was expanded to 11,650 images using standard data augmentation techniques to balance classes and improve generalization.
Augmentation Parameters Applied:
- Flip: Horizontal and Vertical.
- Rotate: 90° (Clockwise and Counter-clockwise).
- Crop: Random zoom (0% to 20%).
- Shear: ±10° (Horizontal and Vertical).
- Saturation: ±25%.
- Brightness: ±16%.
- Exposure: ±10%.
- Blur: Up to 2.5 pixels.
- Noise: Up to 0.1% of pixels.
Classes (14 Labels):
1. Neutrophil Segmented
2. Lymphocyte
3. Blast
4. Cell Debris (Restos Celulares)
5. Band Neutrophil (Bastonete)
6. Atypical Lymphocyte
7. Monocyte
8. Eosinophil
9. Metamyelocyte
10. Myelocyte
11. Promyelocyte
12. Artifact
13. Erythroblast
14. Basophil
Formats Included:
- YOLO: Images + .txt labels (bounding box coordinates).
- ViT: Images organized for classification tasks.
Dataset Split Strategy
The dataset is pre-split into three subsets following an 80/10/10 ratio, ensuring a consistent evaluation benchmark:
- Training: 9,320 images (80%)
- Validation: 1,165 images (10%)
- Test: 1,166 images (10%)
Note: The augmentation was applied prior to the split to ensure class balance, but strict separation was maintained to prevent data leakage between subsets.
Citation:
If you use this dataset in your research, please cite it as follows:
Dataset:
Kasprowicz, J., Bacca, H. G., Benevides, Y. D. P., & Silva, A. G. (2025).
14-Class Leukocyte Classification Dataset for Peripheral Blood Cell Classification (YOLO & ViT) [Data set].
Zenodo. https://doi.org/10.5281/zenodo.17743609
Files
dataset-vit.zip
Files
(577.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:35fc76e1244963360e108a39f3797a2a
|
183.2 MB | Preview Download |
|
md5:7c73e661d92dfdce2bc4a395aeba6360
|
394.0 MB | Preview Download |