In vivo Human Skin Optical Coherence Tomography (OCT) Dataset for Classification Benchmarking and Digital Phantom Generation
Authors/Creators
- 1. A.V. Gaphonov-Grekhov Institute of Applied Physics, RAS, Nizhny Novgorod, Russia
- 2. N.I. Lobachevsky Research State University, Nizhny Novgorod, Russia
- 3. N.A. Semashko Clinic of Nizhny Novgorod Region, Nizhny Novgorod, Russia
Description
Overview
This dataset contains 120 Optical Coherence Tomography (OCT) B-scans of human skin acquired in vivo from volunteers. The data is structured and labeled by biological sex, age group, and anatomical localization.
This collection represents a curated subset of a larger 3D OCT dataset presented in the paper:
A. A. Sovetsky, K. S. Petrova, M. A. Brueva, M. G. Ryabkov, A. L. Matveyev, L. A. Matveev, V. Y. Zaitsev. "Automated segmentation and skin-layer thickness estimation by extracting the optical scattering coefficient and speckle contrast parameter from optical coherence tomography scans". Skin Pharmacology and Physiology, 2026 (DOI: 10.1159/000550613).
Objectives
We curated this dataset to address two primary research goals:
1) Benchmarking Foundation Models: The dataset is designed to serve as an open benchmark for foundation models in binary classification tasks. It enables researchers to test the accuracy of classifying OCT scans across paired classes (e.g., Male vs. Female, Young vs. Old, Cheek vs. Eye Corner) using various inference modes, including zero-shot learning, prompt learning (few-shot), and fine-tuning.
2) Generative AI and Digital Phantoms: The dataset serves as a reference for creating digital skin phantoms. It enables the extraction of parameters and distributions of optical scatterers to replicate biological structures. This is essential for generating synthetic OCT scans via virtual scanning simulation (This methodology is described in: https://doi.org/10.1007/978-3-032-05573-6_5).
Data Structure and Format The dataset is organized into a folder hierarchy:
Sex (Male, Female) / Age Group (1950-1960, 1990-2000) / Localization (Cheek, Eye_corner).
Each scan is provided in two formats - .npy (NumPy) and .png - Visual representation of the B-scans for quick inspection.
Raw data containing the logarithmic signal amplitude.
Dimensions: 256 (depth) x 512 (lateral) pixels
Type: uint8 (0-255)
Scale: Logarithmic (~51 dB dynamic range, where 1 unit ≈ 0.2 dB).
Pixel Size: 6 µm (axial) x 6 µm (lateral)
FWHM Beam Diameter: ~20 µm
Center Wavelength: 1.3 µm
A Python reader for the .npy files is included in this repository. For detailed instructions, updates, and additional tools, please visit the GitHub repository: https://github.com/SynthOCTChallenge/OCT_scans_NPY_Reader
Citation & Usage Policy
This dataset is made freely available for any reuse, including but not limited to research, benchmarking, and challenges.
Use of this dataset requires appropriate citation of both this dataset (https://doi.org/10.5281/zenodo.18095266) and the associated paper:
A. A. Sovetsky, K. S. Petrova, M. A. Brueva, M. G. Ryabkov, A. L. Matveyev, L. A. Matveev, V. Y. Zaitsev. "Automated segmentation and skin-layer thickness estimation by extracting the optical scattering coefficient and speckle contrast parameter from optical coherence tomography scans". Skin Pharmacology and Physiology, 2026 (DOI: 10.1159/000550613).
This dataset has also been submitted for use in the SynthOCT Challenge (www.synthOCT.com). The challenge aims to develop approaches for generating realistic optical scatterer distributions to form OCT-ready digital phantoms for virtual scanning and synthetic dataset generation (https://doi.org/10.1007/978-3-032-05573-6_5 ; https://github.com/OCTDigitalPhantoms/MICCAI_SASHIMI_2025).
Funding
This dataset was curated and published with the support of the Russian Science Foundation (RSF):
RSF Grant № 25-12-20032: "New Approaches to the Development of Algorithms for Analyzing OCT Scans: Modification and Optimization of Large Models Based on Physical Principles and Conditions of OCT Signal Formation" (https://rscf.ru/en/project/25-12-20032/) - specifically for data curation for model training, benchmarking tools, and challenges;
RSF Grant № 22-12-00295-П: "Optical coherence elastography and related modalities: development of physical principles and demonstration of new applications" (https://rscf.ru/en/project/22-12-00295/) - specifically for dataset formation and preprocessing.
This dataset represents 2D OCT scans under the CC-BY 4.0 license for unrestricted free reuse and redistribution. To request access to the 3D OCT scans, please contact the corresponding author of the paper mentioned above or Dr. Lev Matveev at drlevmatveev@gmail.com.
Files
DATASET.zip
Files
(38.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:6c461f35a74b768b2960e0391687c10f
|
38.2 MB | Preview Download |
Additional details
Related works
- Is published in
- Journal article: 10.1159/000550613 (DOI)
Funding
- Institute of Applied Physics
- "New Approaches to the Development of Algorithms for Analyzing OCT Scans: Modification and Optimization of Large Models Based on Physical Principles and Conditions of OCT Signal Formation" Russian Science Foundation (RSF) Grant № 25-12-20032
- Institute of Applied Physics
- "Optical coherence elastography and related modalities: development of physical principles and demonstration of new applications" Russian Science Foundation (RSF) Grant № 22-12-00295-П
Software
- Repository URL
- https://github.com/SynthOCTChallenge/OCT_scans_NPY_Reader
- Programming language
- Python
- Development Status
- Active
References
- A. A. Sovetsky, K. S. Petrova, M. A. Brueva, M. G. Ryabkov, A. L. Matveyev, L. A. Matveev, V. Y. Zaitsev. "Automated segmentation and skin-layer thickness estimation by extracting the optical scattering coefficient and speckle contrast parameter from optical coherence tomography scans". Skin Pharmacology and Physiology, 2026 (DOI: 10.1159/000550613)