A multi-speaker dataset of real-time two-dimensional speech magnetic resonance images with articulator ground-truth segmentations
- 1. Clinical Physics, Barts Health NHS Trust, West Smithfield, London EC1A 7BE, United Kingdom. School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London SE1 7EH, United Kingdom.
- 2. Clinical Physics, Barts Health NHS Trust, West Smithfield, London EC1A 7BE, United Kingdom .
- 3. Clinical Physics, Barts Health NHS Trust, West Smithfield, London EC1A 7BE, United Kingdom. Centre for Advanced Cardiovascular Imaging, NIHR Barts Biomedical Research Centre, William Harvey Institute, Queen Mary University of London, London EC1M 6BQ, United Kingdom. Digital Environment Research Institute (DERI), Empire House, 67-75 New Road, Queen Mary University of London, London E1 1HH, United Kingdom.
Description
Summary
This dataset consists of real-time magnetic resonance images of speech and corresponding ground-truth (GT) segmentations and velopharyngeal closure labels.
Images
The images are of five healthy adult volunteers (two females, three males; age range 24-28 years) counting a single time from 1 to 10 in English. Each volunteer was imaged in a supine position using a 3.0 T TX Achieva magnetic resonance imaging (MRI) scanner and a 16-channel neurovascular coil (both Philips Healthcare, Best, Netherlands). Images of a 10 mm thick midsagittal slice of the head were acquired using a steady state free procession (SSFP) pulse sequence based on the sequence identified by [1] as being optimal for vocal tract image quality. The acquired matrix size and in-plane pixel size were 120×93 and 2.50×2.45 mm2 respectively. However, k-space data were zero padded to a matrix size of 256×256 by the scanner before being reconstructed, resulting in a reconstructed in-plane pixel size of 1.17×1.17 mm2. Images were acquired at a temporal resolution of 0.1s and one image series was acquired per volunteer. The volunteers were instructed to perform the speech task at a rate which they considered to be normal. Some performed the task faster than others and consequently not all series had the same number of images. The series have 105, 71, 71, 78 and 67 images each (392 images in total).
Velopharyngeal closure labels
Each image was visually inspected and labelled as either showing contact between the soft palate and posterior pharyngeal wall or not showing contact. A label of 1 indicates contact, while a label of 0 indicates no contact. To reduce the subjectivity of the labels, each image was independently labelled by four MRI Physicists with four, ten, two and one years of speech MRI experience, and the majority label was chosen as the GT label.
Ground-truth segmentations
GT segmentations were created by manually labelling pixels in each of the images. The segmentations consisted of six classes, each made up of one or more anatomical features. There was no overlap between classes: a pixel could not belong to more than one class. For conciseness, the classes were named as follows: head, soft palate, jaw, tongue, vocal tract and tooth space. However, the names of the head, jaw and tongue classes are simplifications. The head class consisted of all anatomical features superior to or posterior to the vocal tract. It therefore included the upper lip, hard palate, brain, skull, posterior pharyngeal wall and neck. The jaw class consisted of the lower lips, the soft tissue anterior to and inferior to the mandible and the soft tissue inferior to the tongue. The tongue class included the epiglottis and the hyoid bone. Pixels not labelled as belonging to one of the classes were considered to belong to the background. GT segmentations were created by the MRI Physicist with four years of speech MRI experience.
Dataset structure
Images are contained in the MRI_SSFP_10fps folder. Within this folder, each subfolder contains the images of a different volunteer. Each image is saved as a separate DICOM file with name image_N.dcm.
Velopharyngeal closure labels are saved in velopharyngeal_closure.xslx. The labels of each volunteer are saved in different sheets. The spreadsheet row corresponds to the image number (i.e. the label in row 1 is the label for image 1).
Ground-truth segmentations are contained in the GT_Segmentations folder. Within this folder, each subfolder contains the GT segmentations of a different volunteer. Each GT segmentation is saved as a separate MAT file with name mask_N.mat. In each MAT file, pixels with the following values correspond to the following class:
- 0 = background
- 1 = head
- 2 = soft palate
- 3 = jaw
- 4 = tongue
- 5 = vocal tract
- 6 = tooth space
References
[1] A.D. Scott, R. Boubertakh, M.J. Birch, M.E. Miquel, Towards clinical assessment of velopharyngeal closure using MRI: evaluation of real-time MRI sequences at 1.5 and 3 T, Br. J. Radiol. 85 (2012) e1083–e1092. https://doi.org/10.1259/bjr/32938996.
Notes
Files
data.zip
Files
(21.6 MB)
Name | Size | Download all |
---|---|---|
md5:5b401468771adecb51060fb51442d5a7
|
21.6 MB | Preview Download |