Published November 16, 2025
| Version v1
Preprint
Open
Explainable Multimodal Deepfake Detection using Image and Audio Transformers with Attention Mechanisms
Authors/Creators
Description
This work takes a deep look at current deepfake detection methods and introduces a new multimodal framework that mixes vision transformers, audio-based spectral transformers, and attention-driven explainability tools. We review the latest approaches in both single-modal and multimodal detection, discuss the main datasets and evaluation metrics used in the field, and highlight the major challenges researchers still face. In the end, we put forward an explainable multimodal model that uses cross-modal attention to make deepfake detection both stronger and more transparent.
Files
Explainable Multimodal Deepfake Detection.pdf
Files
(189.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:282f563360d750ab39355ebd605b61c8
|
189.2 kB | Preview Download |