Published March 24, 2025 | Version 1.0
Dataset Open

Mediapipe based Preprocessed VGGFace2 Dataset

  • 1. PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy
  • 2. Department of Research and Development (R&D), GPI SpA, Trento, Italy
  • 3. Department of Computer Science, Ca' Foscari University of Venice, Italy
  • 4. Tianjin Polytechnic University, School of Artificial Intelligence, Binshui West Road No. 399, Tianjin 300387, PR China
  • 5. Department of Computer Science, COMSATS University Islamabad (CUI), Wah Campus, Wah 47000, Pakistan
  • 6. Department of Computer Science, University of Punjab, Lahore 54000, Pakistan
  • 7. PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, 10129 Turin, Italy

Description

VGGFace2 Dataset and Face Mesh Preprocessing
Introduction
The VGGFace2 dataset is a large-scale face recognition dataset containing over 3.31 million images of 9,131 identities, with an average of 362 images per identity. The dataset is designed to include extensive variations in pose, age, illumination, ethnicity, and profession, making it one of the most diverse and challenging face recognition datasets available. For more details, please refer to the original publication:
VGGFace2: A dataset for recognizing faces across pose and age - DOI: 10.48550/arXiv.1710.08092

Preprocessing Using MediaPipe 3D Face Mesh
On this dataset, we applied the MediaPipe-based 3D face mesh algorithm to accurately detect faces while removing all background elements, including hair. Our preprocessing strictly retained facial landmarks, ensuring that only the essential facial features were preserved. This approach significantly enhanced the accuracy and generalization of our model, as the model was trained exclusively on landmark-based facial data.

Training and Performance
The preprocessed data was utilized to train Xception model, which resulted in remarkably accurate outcomes due to the strictly landmark-based facial representation. The model demonstrated robust performance including explainable-AI, proving that eliminating unnecessary background elements contributed positively to its efficiency and reliability.

Citation
If you use this dataset or the preprocessed version in your work, please cite both of the following:

VGGFace2 Dataset:

@article{Cao2018VGGFace2,
    title={VGGFace2: A dataset for recognizing faces across pose and age},
    author={Cao, Qiong and Shen, Li and Xie, Weidi and Parkhi, Omkar M and Zisserman, Andrew},
    journal={arXiv preprint arXiv:1710.08092},
    year={2018}
}  


DOI: [10.48550/arXiv.1710.08092](https://doi.org/10.48550/arXiv.1710.08092)  
Preprocessed Dataset using MediaPipe:@dataset{Shah2025_MediaPipe_FaceMesh,
    title={MediaPipe-based 3D Face Mesh Preprocessed VGGFace2 Dataset},
    author={Shah, Syed Taimoor Hussain and Shah, Syed Adil Hussain and Zamir, Ammara and Qayyum, Kainat and Shah, Syed Baqir Hussain and Fatima, Syeda Maryam and Deriu, Marco Agostino},
    year={2025},
    doi={10.5281/zenodo.15078557}
}  
DOI: [10.5281/zenodo.15078557](https://doi.org/10.5281/zenodo.15078557)  


Contact
For any questions or further details, please feel free to contact us.
Syed Taimoor Hussain Shah
PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy
Email: taimoor.shah@polito.it
ORCID: 0000-0002-6010-6777

Files

VGGFace2.zip

Files (11.5 GB)

Name Size Download all
md5:6b18eec410a88aa5d0b142b820453b96
11.5 GB Preview Download
md5:1151b5f9de7ee9cc45d94a63aa8c76e0
329.9 kB Preview Download
md5:9c6ed31d493a6e69073a5a0fac60fbf1
301.5 kB Preview Download

Additional details

Funding

European Commission
PARENT - PremAtuRe nEwborn motor and cogNitive impairmenTs: Early diagnosis 956394