Synthetic Data Generation and Impact Analysis of Machine Learning Models for Enhanced Credit Card Fraud Detection

Islam, Shareeful; Ahmed Abdullah, Ahmed; Hasan, Md. Mahmudul; Papastergiou, Spyridon; Mouratidis, Haralambos

doi:10.5281/zenodo.14832599

Published June 21, 2024 | Version v1

Conference proceeding Open

Synthetic Data Generation and Impact Analysis of Machine Learning Models for Enhanced Credit Card Fraud Detection

1. Anglia Ruskin University
2. Quantonova
3. University of Piraeus
4. Maggioli S.p.A.
5. University of Essex

The financial industry is currently experiencing a substantial shift in its operating landscape as a result of the swift integration of technology. This transformation brings with it potential risks and challenges. Heightened occurrence of online fraud is one the key concerns for this sector, which has been exacerbated by the growing prevalence of online payment methods on e-commerce platforms and other websites. The identification of credit card fraud is a challenging tasks due to nature of imbalanced transactional data to detect and predict any fraudulent activities. In this context, this paper provides a unique approach to create synthetic dataset to tackle imbalanced issue for credit card fraud detection. The approach adopts Synthetic Minority Over-sampling Technique (SMOTE) technique for data generation. An experiment is performed using a number of ML models including SVM, KNN, and Random Forest to demonstrate the feasibility of using synthetic data. In this study, we have combined resampling techniques like SMOTE for oversampling the minority class with ensemble methods and appropriate evaluation metrics like the F1-score to improve the imbalanced data. The result from the experiment compared with widely used public datasets to evaluate the model performance. The analysis reveals a significant imbalance in the real ULB dataset, with the positive class (frauds) comprising a mere 0.172% of all transactions. The findings clearly show that the Random Forest model performs better than other modes with outstanding precision, recall, accuracy, and F1 score values to detect fraudulent transactions and reduce false positives

Files

Files (339.7 kB)

Name	Size	Download all
Paper 2_AI_Fraud_AIAI-Vr6.docm md5:179fc919ae5b03240b70108efd87f28f	339.7 kB	Download

Additional details

DOI: 10.1007/978-3-031-63211-2_27
URL: https://link.springer.com/chapter/10.1007/978-3-031-63211-2_27

	All versions	This version
Views	125	125
Downloads	88	88
Data volume	29.9 MB	29.9 MB

Synthetic Data Generation and Impact Analysis of Machine Learning Models for Enhanced Credit Card Fraud Detection

Authors/Creators

Description

Files

Files (339.7 kB)

Additional details

Identifiers