FaciaVox a Multimodal Biometric Dataset
Description
The FaciaVox dataset is an extensive multimodal biometric resource designed to enable in-depth exploration of face-image and voice recording research areas in both masked and unmasked scenarios.
Features of the Dataset:
1. Multimodal Data: A total of 1,800 face images (JPG) and 6,000 audio recordings (WAV) were collected, enabling cross-domain analysis of visual and auditory biometrics.
2. Participants were categorized into four age groups for structured labeling:
Label 1: Under 16 years
Label 2: 16 to less than 31 years
Label 3: 31 to less than 46 years
Label 4: 46 years and above
3. Sibling Data: Some participants are siblings, adding a challenging layer for speaker identification and facial recognition tasks due to genetic similarities in vocal and facial features. Sibling relationships are documented in the accompanying "FaciaVox List" data file.
4. Standardized Filenames: The dataset uses a consistent, intuitive naming convention for both facial images and voice recordings. Each filename includes:
Type (F: Face Image, V: Voice Recording)
Participant ID (e.g., sub001)
Mask Type (e.g., a: unmasked, b: disposable mask, etc.)
Zoom Level or Sentence ID (e.g., 1x, 3x, 5x for images or specific sentence identifier {01, 02, 03, ..., 10} for recordings)
5. Diverse Demographics: 19 different countries.
6. A challenging face recognition problem involving reflective mask shields and severe lighting conditions.
7. Each participant uttered 7 English statements and 3 Arabic statements, regardless of their native language. This adds a challenge for speaker identification.
Research Applications
FaciaVox is a versatile dataset supporting a wide range of research domains, including but not limited to:
• Speaker Identification (SI) and Face Recognition (FR): Evaluating biometric systems under varying conditions.
• Impact of Masks on Biometrics: Investigating how different facial coverings affect recognition performance.
• Language Impact on SI: Exploring the effects of native and non-native speech on speaker identification.
• Age and Gender Estimation: Inferring demographic information from voice and facial features.
• Race and Ethnicity Matching: Studying biometrics across diverse populations.
• Synthetic Voice and Deepfake Detection: Detecting cloned or generated speech.
• Cross-Domain Biometric Fusion: Combining facial and vocal data for robust authentication.
• Speech Intelligibility: Assessing how masks influence speech clarity.
• Image Inpainting: Reconstructing occluded facial regions for improved recognition.
Researchers can use the facial images and voice recordings independently or in combination to explore multimodal biometric systems. The standardized filenames and accompanying metadata make it easy to align visual and auditory data for cross-domain analyses. Sibling relationships and demographic labels add depth for tasks such as familial voice recognition, demographic profiling, and model bias evaluation.