Fraud Detection in Financial Transactions Using Machine Learning: Insights from the PaySim Mobile Money Dataset
Authors/Creators
- 1. Booth School of Business, University of Chicago, USA.
- 2. Department of Mathematics Statistical Analytics, Computing and Modeling, Texas A&M University, Kingsville, USA.
- 3. Department of Mathematics and Science Education, Middle Tennessee State University, USA.
- 4. Department of Computer Science, Predictive analytics, Austin Peay State University, Tennessee, USA.
- 5. School of Computing and Data Science, Wentworth Institute of Technology, Boston, USA.
- 6. Independent Researcher, USA.
Description
The rapid digital transformation of financial systems has increased the risk of fraud in mobile payment ecosystems. This paper analyzes fraudulent behavior in the PaySim mobile-money dataset using feature engineering and supervised classification. We trained and compared Logistic Regression, K-Nearest Neighbors (KNN), Decision Tree, and Random Forest classifiers using stratified 80:20 splitting and class-weighting to counter extreme class imbalance. For the test set, Decision Tree achieved the best overall balance between precision and recall (Precision = 0.6835, Recall = 0.9696, F1 = 0.8018, ROC-AUC = 0.9845). Random Forest produced very high recall (0.9838) and ROC-AUC (0.9990) but low precision (0.1576), resulting in many false positives. These results indicate ensemble and tree-based methods can detect most fraud events in this dataset, but there is a trade-off between minimizing missed fraud (false negatives) and limiting false alarms for legitimate users. We recommend using precision–recall analysis, threshold tuning, and cost-sensitive methods in operational settings to control that trade-off.
Files
WJARR-2025-4058.pdf
Files
(819.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:a3f8f35d12459752a16b261f534276c3
|
819.8 kB | Preview Download |