Published November 16, 2025 | Version v1
Preprint Open

Explainable Multimodal Deepfake Detection using Image and Audio Transformers with Attention Mechanisms

  • 1. BMS College of Engineering
  • 2. ROR icon B.M.S. College of Engineering

Description

This work takes a deep look at current deepfake detection methods and introduces a new multimodal framework that mixes vision transformers, audio-based spectral transformers, and attention-driven explainability tools. We review the latest approaches in both single-modal and multimodal detection, discuss the main datasets and evaluation metrics used in the field, and highlight the major challenges researchers still face. In the end, we put forward an explainable multimodal model that uses cross-modal attention to make deepfake detection both stronger and more transparent.

 
 

Files

Explainable Multimodal Deepfake Detection.pdf

Files (189.2 kB)

Name Size Download all
md5:282f563360d750ab39355ebd605b61c8
189.2 kB Preview Download