Published January 15, 2026 | Version v1
Journal article Open

Lightweight Multi-Modal AI Framework for Face-Swap Deepfake Detection

Description

Face-swap deepfakes have risen in fidelity and accessibility, posing growing threats to personal privacy, identity integrity, and public trust in digital media. The sophistication of modern generative models allows manipulated content to bypass casual human observation and even deceive conventional automated detectors. This growing realism demands robust, transparent, and computationally efficient detection systems. We propose a lightweight, multi-modal AI framework that fuses spatial (CNN), temporal (LSTM/GRU), frequency (DCT/FFT), and audio modalities through an attention-based fusion mechanism to identify face-swap deepfake videos. The framework is designed not only for detection accuracy but also for real-world deployability—leveraging key-frame extraction and compact neural backbones to operate effectively on constrained hardware. In addition, explainability is prioritized through visualization tools such as Grad-CAM, integrated gradients, and modalitylevel confidence reporting to enhance forensic interpretability. Our work bridges the gap between high-performance academic models and practical field applications by focusing on modular design, reproducible experimentation, and cross-dataset generalization. The resulting system aims to support real-time media verification pipelines, assist investigators in forensic reporting, and promote public resilience against synthetic media threats. Overall, the framework lays the foundation for transparent, efficient, and responsible deepfake detection in the evolving landscape of generative AI.

Files

Lightweight Multi-Modal AI Framework.pdf

Files (841.6 kB)

Name Size Download all
md5:d51bcef5398cc1042f676ad25c240008
841.6 kB Preview Download

Additional details

References

  • 1. F. Abbas and A. Taeihagh, "Unmasking deepfakes: A systematic review of deepfake detection and generation techniques using artificial intelligence," Expert Systems with Applications, vol. 252, Part B, Art. no. 124260, 2024.
  • 2. N. Heidari, J. Jafari Navimipour, H. Dag, and M. Unal, "Deepfake detection using deep learning methods: A systematic and comprehensive review," Expert Systems with Applications, 2024.
  • 3. L. A. Passos Ju´nior et al., "A Review of Deep Learning-based Approaches for Deepfake Content Detection," 2023.
  • 4. T. Fernando, D. Priyasad, S. Sridharan, A. Ross, and C. Fookes, "Face Deepfakes – A Comprehensive Review," IEEE Transactions on Technology and Society, 2024.
  • 5. A. H. Soudy et al., "Deepfake detection using convolutional vision transformers and convolutional neural networks," Neural Computing and Applications, vol. 36, pp. 19759–19775, 2024.
  • 6. D. Wodajo, P. Lambert, G. Van Wallendael, S. Atnafu, and H. Mareen, "Improved Deepfake Video Detection Using Convolutional Vision Transformer," IEEE GEM, pp. 1–6, 2024.
  • 7. A. Kaur et al., "Deepfake video detection: challenges and opportunities," Artificial Intelligence Review, vol. 57, p. 159, 2024.
  • 8. H. Lin, W. Huang, W. Luo, and W. Lu, "DeepFake detection with multiscale convolution and vision transformer," Digital Signal Processing, vol. 134, 2023, Art. no. 103895.
  • 9. F. Liu et al., "A novel deepfake video detection method based on frequency analysis," 2022.
  • 10. T. P. Nagarhalli et al., "A Comprehensive Review of Deepfake and Its Detection Techniques," 2024.