There is a newer version of the record available.

Published June 1, 2026 | Version v1
Dataset Open

Irish-Accented English Audio-Visual Deepfake Datasets with Deep Packet Inspection-Inspired Media Integrity Validation

  • 1. Atlantic Technological University
  • 2. University of Galway
  • 3. University College Dublin
  • 4. CISCO

Description

This dataset contains Irish-accented English audio-only and synchronised audio–video samples curated for research on deepfake detection, multimodal learning, media integrity validation, cybersecurity, accent robustness, and bias-aware evaluation.

The deposit includes authentic and synthetic samples, file-level metadata, labels, source-provenance documentation, and validation materials. Authentic media were retrieved from publicly accessible Archive.org item pages and manually reviewed for Irish-accented English speech. Synthetic samples were generated using generic text-to-speech and prompt-based media generation workflows. No synthetic sample was generated to clone, impersonate, face-swap, lip-sync, or reproduce the voice, face, likeness, or identity of any known individual.

The source-provenance table lists each unique Archive.org item used for the authentic media subset, including the item URL, title, number of derived clips, and observed licence/rights status. The authors do not claim ownership over third-party authentic Archive.org media and do not relicense those third-party media. The authors’ licence applies to the metadata, labels, documentation, validation scripts, processing code, and author-generated synthetic media. Third-party authentic media remain subject to their original rights and applicable Archive.org item-level terms.

Files

AudioVideoDatasetDF.zip

Files (1.4 GB)

Name Size Download all
md5:3e6fc35d2acc470641d632037eb1e2dd
1.4 GB Preview Download