Synthetic Data Generation and Impact Analysis of Machine Learning Models for Enhanced Credit Card Fraud Detection
Creators
Description
The financial industry is currently experiencing a substantial shift in its operating landscape as a result of the swift integration of technology. This transformation brings with it potential risks and challenges. Heightened occurrence of online fraud is one the key concerns for this sector, which has been exacerbated by the growing prevalence of online payment methods on e-commerce platforms and other websites. The identification of credit card fraud is a challenging tasks due to nature of imbalanced transactional data to detect and predict any fraudulent activities. In this context, this paper provides a unique approach to create synthetic dataset to tackle imbalanced issue for credit card fraud detection. The approach adopts Synthetic Minority Over-sampling Technique (SMOTE) technique for data generation. An experiment is performed using a number of ML models including SVM, KNN, and Random Forest to demonstrate the feasibility of using synthetic data. In this study, we have combined resampling techniques like SMOTE for oversampling the minority class with ensemble methods and appropriate evaluation metrics like the F1-score to improve the imbalanced data. The result from the experiment compared with widely used public datasets to evaluate the model performance. The analysis reveals a significant imbalance in the real ULB dataset, with the positive class (frauds) comprising a mere 0.172% of all transactions. The findings clearly show that the Random Forest model performs better than other modes with outstanding precision, recall, accuracy, and F1 score values to detect fraudulent transactions and reduce false positives
Files
Files
(339.7 kB)
Name | Size | Download all |
---|---|---|
md5:179fc919ae5b03240b70108efd87f28f
|
339.7 kB | Download |