Deepfake Audio Detection through Spectrogram Analysis and Deep Learning (Real Talk)
Description
The project report introduces "Real Talk," an AI-driven system designed to combat the rising threat of malicious voice cloning and deepfakes used in identity theft and financial scams. Because traditional verification methods struggle to distinguish synthetic audio from authentic human speech , the proposed system utilizes Convolutional Neural Networks (CNNs) combined with Mel-Frequency Cepstral Coefficients (MFCC) to perform deep frequency analysis. By transforming audio files into visual spectrograms using the Librosa library, the model effectively identifies the microscopic digital artifacts and inaccuracies inherently left behind by Generative AI models. Ultimately, "Real Talk" provides consumers with a user-friendly web interface that delivers a percentage-based confidence meter, boasting an expected theoretical detection accuracy between 96.8% and 97.2% and processing results in strictly under 10 seconds.
This work was conducted at Arab International University (AIU), Syria. The official website of the university is: https://www.aiu.edu.sy
Files
MohammadKanj_2026_DeepfakeAudioDetection.pdf
Files
(769.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:bbc657f0cf44f48e9cfab78de0e538c2
|
769.7 kB | Preview Download |
Additional details
References
- [1] T. Barhoum, "SETF: A Structured Engineering Thesis Framework for Artificial Intelligence, Software Engineering, and Robotics," Zenodo, Apr. 2026. doi: 10.5281/zenodo.19686845.
- [2] I. Goodfellow, Y. Bengio, and A. Courville, "Deep Learning," Cambridge, MA, USA: MIT Press, 2016.
- [3] N. Evans et al., "Spoofing and countermeasures for automatic speaker verification," in Proc. INTERSPEECH, pp. 925-929, 2015.
- [4] C. M. Bishop, "Pattern Recognition and Machine Learning," New York, NY, USA: Springer, 2006.
- [5] M. Abadi et al., "TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems," arXiv preprint arXiv:1603.04467, 2016. [Online]. Available: https://arxiv.org/abs/1603.04467.
- [6] F. Chollet et al., "Keras: The Python Deep Learning library," Astrophysics Source Code Library, 2015. [Online]. Available: https://keras.io.
- [7] A. Paszke et al., "PyTorch: An Imperative Style, High-Performance Deep Learning Library," in Advances in Neural Information Processing Systems 32, pp. 8024-8035, Curran Associates, Inc., 2019.
- [8] H. K. Palo and M. Kumar, "Deepfake Audio Detection using Convolutional Neural Networks and Spectrogram Analysis," IEEE Transactions on Information Forensics and Security, vol. 28, pp. 1345-1358, 2021.
- [9] S. S. Al-Amri and K. A. Al-Jallad, "A Review of Feature Extraction Techniques in Speech Processing," Journal of Intelligent Systems and Robotics, vol. 11, no. 4, pp. 201-215, 2023.
- [10] B. McFee et al., "librosa: Audio and Music Signal Analysis in Python," in Proceedings of the 14th Python in Science Conference, pp. 18-24, 2015.
- [11] M. Todisco, H. Delgado, and N. Evans, "Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification," Computer Speech & Language, vol. 45, pp. 516-535, 2017.
- [12] J. Yamagishi et al., "ASVspoof 2021: Accelerating progress in spoofed and deepfake speech detection," in Proc. Interspeech, pp. 1649-1653, 2021.
- [13] A. Hamza, M. Al-Zubaidi, and L. Torres, "Voice Spoofing Detection Using Deep Learning on VSDC Dataset," IEEE Access, vol. 10, pp. 45892-45905, 2022.
- [14] M. Abuhmida et al., "Enhancing Audio Deepfake Detection: A Study of Deep Learning Parameters," in Proc. International Conference on Emerging Technologies in Computing (iCETIC), Springer, pp. 88-102, 2024.
- [15] M. Khochare and H. Lee, "Deepfake Speech Detection Using Emotion-Based Features," IEEE Signal Processing Letters, vol. 30, pp. 210-214, 2023.
- [16] Y. Xie, J. Smith, and M. Park, "The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio," IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 112-125, 2025.