Published February 6, 2026 | Version v1.0
Software Open

joesound212985/Deepfake-Detection-using-CLIP-Based-SigLIP-2-Vision-Transformers: v1.0

Authors/Creators

Description

Deepfakes pose a growing threat to the integrity of visual media, necessitating robust detection methods. However, existing detection approaches still struggle to reliably identify forged images and videos, particularly as modern deepfakes become increasingly realistic and human-indistinguishable. This paper proposes a deepfake detection approach based on CLIP-derived vision transformers (SigLIP-2), combined with a multi-task design for classification and manipulated-region localization. The models are evaluated on three public benchmarks of increasing complexity: HiDF, SIDA, and CIFake. Our detector achieves state-of-the-art results in all three. On HiDF, it achieves an AUC of 0.931 for deepfake video detection, improving by ~0.20 over the best prior (EB4), and a similarly high AUC of 0.968 on images. On SIDA, the model reaches 99.1% accuracy, substantially outperforming the previous 93.5% baseline while correctly localizing most tampered pixels. It also exceeds 95% accuracy on CiFake, with an AUC of 0.986. The proposed model substantially advances detection performance on challenging realistic forgeries, providing both high precision and interpretable localization to support practical deepfake mitigation.

Files

joesound212985/Deepfake-Detection-using-CLIP-Based-SigLIP-2-Vision-Transformers-v1.0.zip

Additional details