Published March 14, 2026 | Version v1
Journal article Open

SecureVision: Real-Time Multimodal Cyber Deepfake Identification System

  • 1. Rajeev Gandhi Memorial College of Enginnering Technology

Description

Deepfake technology has rapidly evolved into a serious cybersecurity concern, making it possible to create highly convincing fake audio and video content that is difficult to distinguish from real media. These manipulations can lead to misinformation, identity theft, and financial fraud. To address this growing challenge, this project introduces SecureVision, a smart and reliable multimodal deepfake detection framework. SecureVision combines deep learning, self-supervised learning, Vision Transformers (ViT), and big data analytics to build a strong defense against digital manipulation. Instead of analyzing only one type of media, the system simultaneously examines both audio and images, improving overall detection accuracy and reliability. For audio deepfake detection, the model leverages SpecRNet architecture, while image classification is performed using a Vision Transformer-based approach. The system is trained on large-scale datasets such as ASVspoof 2021, multilingual audio datasets, and diverse web-scraped facial image collections. Experimental results show promising performance, achieving 92.34% accuracy for audio detection and 89.35% for image detection. Despite its advanced capabilities, SecureVision is designed to operate efficiently with moderate GPU requirements. Overall, the framework offers a scalable, practical, and real-world solution to combat the increasing threat of deepfake attacks

Files

securevision-real-time-multimodal-cyber-deepfake-identification-system-IJERTV15IS030284.pdf

Additional details