FaciaVox a Multimodal Biometric Dataset

Abuqaaud, Kamal; Bou Nassif, Ali; Shahin, Ismail

doi:10.5281/zenodo.14861092

Published February 12, 2025 | Version v1

Dataset Restricted

FaciaVox a Multimodal Biometric Dataset

The FaciaVox dataset is an extensive multimodal biometric resource designed to enable in-depth exploration of face-image and voice recording research areas in both masked and unmasked scenarios.

Features of the Dataset:

1. Multimodal Data: A total of 1,800 face images (JPG) and 6,000 audio recordings (WAV) were collected, enabling cross-domain analysis of visual and auditory biometrics.

2. Participants were categorized into four age groups for structured labeling:
Label 1: Under 16 years
Label 2: 16 to less than 31 years
Label 3: 31 to less than 46 years
Label 4: 46 years and above

3. Sibling Data: Some participants are siblings, adding a challenging layer for speaker identification and facial recognition tasks due to genetic similarities in vocal and facial features. Sibling relationships are documented in the accompanying "FaciaVox List" data file.

4. Standardized Filenames: The dataset uses a consistent, intuitive naming convention for both facial images and voice recordings. Each filename includes:
Type (F: Face Image, V: Voice Recording)
Participant ID (e.g., sub001)
Mask Type (e.g., a: unmasked, b: disposable mask, etc.)
Zoom Level or Sentence ID (e.g., 1x, 3x, 5x for images or specific sentence identifier {01, 02, 03, ..., 10} for recordings)

5. Diverse Demographics: 19 different countries.

6. A challenging face recognition problem involving reflective mask shields and severe lighting conditions.

7. Each participant uttered 7 English statements and 3 Arabic statements, regardless of their native language. This adds a challenge for speaker identification.

Research Applications

FaciaVox is a versatile dataset supporting a wide range of research domains, including but not limited to:
• Speaker Identification (SI) and Face Recognition (FR): Evaluating biometric systems under varying conditions.
• Impact of Masks on Biometrics: Investigating how different facial coverings affect recognition performance.
• Language Impact on SI: Exploring the effects of native and non-native speech on speaker identification.
• Age and Gender Estimation: Inferring demographic information from voice and facial features.
• Race and Ethnicity Matching: Studying biometrics across diverse populations.
• Synthetic Voice and Deepfake Detection: Detecting cloned or generated speech.
• Cross-Domain Biometric Fusion: Combining facial and vocal data for robust authentication.
• Speech Intelligibility: Assessing how masks influence speech clarity.
• Image Inpainting: Reconstructing occluded facial regions for improved recognition.

Researchers can use the facial images and voice recordings independently or in combination to explore multimodal biometric systems. The standardized filenames and accompanying metadata make it easy to align visual and auditory data for cross-domain analyses. Sibling relationships and demographic labels add depth for tasks such as familial voice recognition, demographic profiling, and model bias evaluation.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

This dataset is for research purposes only. You must not share the dataset with others. Access will be granted only via Zenodo. Upon requesting access, you will receive a Data Usage Agreement (DUA). Sign and return the DUA, and access will be approved.

You are currently not logged in. Do you have an account? Log in here

	All versions	This version
Views	252	252
Downloads	28	28
Data volume	21.9 GB	21.9 GB

FaciaVox a Multimodal Biometric Dataset

Creators

Description

Files

Restricted

Request access