Irish-Accented English Audio-Visual Deepfake Datasets with Deep Packet Inspection-Inspired Media Integrity Validation
Authors/Creators
- 1. Atlantic Technological University
- 2. University of Galway
- 3. University College Dublin
- 4. CISCO
Description
This dataset contains Irish-accented English audio-only and synchronised audio–video samples curated for research on deepfake detection, multimodal learning, media integrity validation, cybersecurity, accent robustness, and bias-aware evaluation.
The deposit includes authentic and synthetic samples, file-level metadata, labels, source-provenance documentation, and validation materials. Authentic media were retrieved from publicly accessible Archive.org item pages and manually reviewed for Irish-accented English speech. Synthetic samples were generated using generic text-to-speech and prompt-based media generation workflows. No synthetic sample was generated to clone, impersonate, face-swap, lip-sync, or reproduce the voice, face, likeness, or identity of any known individual.
The source-provenance table lists each unique Archive.org item used for the authentic media subset, including the item URL, title, number of derived clips, and observed licence/rights status. The authors do not claim ownership over third-party authentic Archive.org media and do not relicense those third-party media. The authors’ licence applies to the metadata, labels, documentation, validation scripts, processing code, and author-generated synthetic media. Third-party authentic media remain subject to their original rights and applicable Archive.org item-level terms.
Files
AudioVideoDatasetDF.zip
Files
(1.4 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:3e6fc35d2acc470641d632037eb1e2dd
|
1.4 GB | Preview Download |