Published September 17, 2024 | Version v2
Dataset Restricted

GIRAFE: Glottal Imaging Dataset for Advanced Segmentation, Analysis, and Facilitative Playbacks Evaluation (1.0)

  • 1. ROR icon Université de Bretagne Occidentale
  • 2. ROR icon Laboratoire de Traitement de l'Information Médicale
  • 3. Universidad Politécnica de Madrid

Description

GIRAFE is a data repository designed to facilitate the development of advanced techniques for the semantic segmentation, analysis, and fast evaluation of High-Speed videoendoscopic sequences of the vocal folds. The HSV recordings were carried out by a specialist of the Otorhinolaryngology Service of Gregorio Marañón Hospital in Madrid between 2013 and 2015. The database comprises 65 recordings from 50 patients: 30 females (60%) and 20 males (40%), with a mean age of 55.65 ± 19.35 years, ranging from 25 to 101. The dataset totals 15 recordings of healthy subjects and 26 cases with identified disorders and/or noticeably affected vocal fold oscillations. Health status information was unavailable for 24 subjects. 

The HSV sequences were acquired using the WOLF® HRES ENDOCAM 5562 camera system and a rigid endoscope with a 70-degree angle of view. The light source was the WOLF AUTO LP 5132. The recordings exhibit varying levels of illumination, contrast, partial occlusion of the glottis, and lateral displacements of the camera, providing a comprehensive resource for robust analysis under diverse conditions. Some key features of the recordings are highlighted as follows:

  • Each sequence contains 502 frames, resulting in 32,630 images available for analysis. This extensive frame count enables thorough temporal analysis, studying dynamic changes over time, and evaluating different segmentation models.
  • The videos capture a sustained vowel phonation, including, in some cases, the vocal onset, providing valuable data on phonatory behavior. This aspect is particularly important for studying the biomechanics of voice production and the initiation of phonation, which are critical for diagnosing voice disorders.
  • The sampling rate was 4,000 fps, and the spatial resolution was 256 x 256 pixels. Such a high frame rate is necessary to accurately capture the fast dynamics of vocal fold motion, while spatial resolution provides sufficient detail for identifying anatomical landmarks and pathologies.
  • The distance between the camera head in the oropharynx and the vocal folds varies, reflecting real-world clinical conditions. This variability enhances the dataset's robustness for developing and testing semantic segmentation algorithms.
  • All sequences were recorded in color, enhancing the visibility of anatomical features and pathologies. Color recordings facilitate better differentiation of tissue types and more accurate identification of pathological changes.

 

GIRAFE: Glottal Imaging Dataset for Advanced Segmentation, Analysis, and Facilitative Playbacks Evaluation

G Andrade-Miranda, K Chatzipapas, JD Arias-Londoño, JI Godino-Llorente 
Data in Brief 59 (111376)
 

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/13773163">Log in</a> to check if you have access.

Additional details

Related works

Funding

Ministry of Economy, Industry and Competitiveness
DEPIA PID2021- 342 128469OB-I00
Ministry of Economy, Industry and Competitiveness
PASEO DPI2017-83405-R1
Ministry of Economy, Industry and Competitiveness
RADAR-PD TED2021-131688B-I00
Ministry of Economy, Industry and Competitiveness
NEUROVOZ TEC- 2012-38630-C04-01