Deepfake Audio Detection through Spectrogram Analysis and Deep Learning (Real Talk)

Kanj, Mohammad

doi:10.5281/zenodo.20004171

Published May 3, 2026 | Version v1

Preprint Open

Deepfake Audio Detection through Spectrogram Analysis and Deep Learning (Real Talk)

Kanj, Mohammad (Researcher)¹

1. Arab International University (AIU)

The project report introduces "Real Talk," an AI-driven system designed to combat the rising threat of malicious voice cloning and deepfakes used in identity theft and financial scams. Because traditional verification methods struggle to distinguish synthetic audio from authentic human speech , the proposed system utilizes Convolutional Neural Networks (CNNs) combined with Mel-Frequency Cepstral Coefficients (MFCC) to perform deep frequency analysis. By transforming audio files into visual spectrograms using the Librosa library, the model effectively identifies the microscopic digital artifacts and inaccuracies inherently left behind by Generative AI models. Ultimately, "Real Talk" provides consumers with a user-friendly web interface that delivers a percentage-based confidence meter, boasting an expected theoretical detection accuracy between 96.8% and 97.2% and processing results in strictly under 10 seconds.

This work was conducted at Arab International University (AIU), Syria. The official website of the university is: https://www.aiu.edu.sy

Files

MohammadKanj_2026_DeepfakeAudioDetection.pdf

Files (769.7 kB)

Name	Size	Download all
MohammadKanj_2026_DeepfakeAudioDetection.pdf md5:bbc657f0cf44f48e9cfab78de0e538c2	769.7 kB	Preview Download

Additional details

[1] T. Barhoum, "SETF: A Structured Engineering Thesis Framework for Artificial Intelligence, Software Engineering, and Robotics," Zenodo, Apr. 2026. doi: 10.5281/zenodo.19686845.
[2] I. Goodfellow, Y. Bengio, and A. Courville, "Deep Learning," Cambridge, MA, USA: MIT Press, 2016.
[3] N. Evans et al., "Spoofing and countermeasures for automatic speaker verification," in Proc. INTERSPEECH, pp. 925-929, 2015.
[4] C. M. Bishop, "Pattern Recognition and Machine Learning," New York, NY, USA: Springer, 2006.
[5] M. Abadi et al., "TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems," arXiv preprint arXiv:1603.04467, 2016. [Online]. Available: https://arxiv.org/abs/1603.04467.
[6] F. Chollet et al., "Keras: The Python Deep Learning library," Astrophysics Source Code Library, 2015. [Online]. Available: https://keras.io.
[7] A. Paszke et al., "PyTorch: An Imperative Style, High-Performance Deep Learning Library," in Advances in Neural Information Processing Systems 32, pp. 8024-8035, Curran Associates, Inc., 2019.
[8] H. K. Palo and M. Kumar, "Deepfake Audio Detection using Convolutional Neural Networks and Spectrogram Analysis," IEEE Transactions on Information Forensics and Security, vol. 28, pp. 1345-1358, 2021.
[9] S. S. Al-Amri and K. A. Al-Jallad, "A Review of Feature Extraction Techniques in Speech Processing," Journal of Intelligent Systems and Robotics, vol. 11, no. 4, pp. 201-215, 2023.
[10] B. McFee et al., "librosa: Audio and Music Signal Analysis in Python," in Proceedings of the 14th Python in Science Conference, pp. 18-24, 2015.
[11] M. Todisco, H. Delgado, and N. Evans, "Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification," Computer Speech & Language, vol. 45, pp. 516-535, 2017.
[12] J. Yamagishi et al., "ASVspoof 2021: Accelerating progress in spoofed and deepfake speech detection," in Proc. Interspeech, pp. 1649-1653, 2021.
[13] A. Hamza, M. Al-Zubaidi, and L. Torres, "Voice Spoofing Detection Using Deep Learning on VSDC Dataset," IEEE Access, vol. 10, pp. 45892-45905, 2022.
[14] M. Abuhmida et al., "Enhancing Audio Deepfake Detection: A Study of Deep Learning Parameters," in Proc. International Conference on Emerging Technologies in Computing (iCETIC), Springer, pp. 88-102, 2024.
[15] M. Khochare and H. Lee, "Deepfake Speech Detection Using Emotion-Based Features," IEEE Signal Processing Letters, vol. 30, pp. 210-214, 2023.
[16] Y. Xie, J. Smith, and M. Park, "The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio," IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 112-125, 2025.

	All versions	This version
Views	21	21
Downloads	6	6
Data volume	8.5 MB	8.5 MB

Deepfake Audio Detection through Spectrogram Analysis and Deep Learning (Real Talk)

Authors/Creators

Description

Files

MohammadKanj_2026_DeepfakeAudioDetection.pdf

Files (769.7 kB)

Additional details

References