Data and Code to reproduce results in paper "Leveraging network topology for credit risk assessment in P2P lending: A comparative study under the lens of machine learning"
Authors/Creators
Contributors
Data collector (2):
Supervisor (2):
Description
Data and Code to reproduce results in paper "Leveraging network topology for credit risk assessment in P2P lending: A comparative study under the lens of machine learning"
This repository contains the necessary codes to reproduce results in the paper:
Yiting Liu, Lennart John Baals, Jörg Osterrieder, Branka Hadji-Misheva,
Leveraging network topology for credit risk assessment in P2P lending: A comparative study under the lens of machine learning,
Expert Systems with Applications,
Volume 252, Part B,
2024,
124100,
ISSN 0957-4174,
https://doi.org/10.1016/j.eswa.2024.124100.
(https://www.sciencedirect.com/science/article/pii/S0957417424009667)
Abstract: Peer-to-Peer (P2P) lending markets have witnessed remarkable growth, revolutionizing the way borrowers and lenders interact. Despite the increasing popularity of P2P lending, it poses significant challenges related to credit risk assessment and default prediction with meaningful implications for financial stability. Traditional credit risk models have been widely employed in the field of P2P lending; however, they may not be capable to capture latent factor information inherent to a loan network based on similarity distances. Thus, in this study we propose an enhanced two-step modeling approach for Machine Learning (ML) that utilizes insights from network analysis and subsequently combines derived network centrality metrics with traditional credit risk factors to improve the prediction accuracy in the credit default prediction process. Through a comparative analysis of three classical ML models with varying degrees of complexity, namely Elastic Net (EN), Random Forest (RF), and Multi-Layer Perceptron (MLP), we showcase novel evidence that the systematic inclusion of network topology features in the credit scoring process can significantly improve the prediction accuracy of the scoring models. Additional robustness tests via the inclusion of randomly shuffled centrality metrics in the analysis, and a further comparison of the graph-based models against a pertinent state-of-the-art credit scoring model in form of XGBoost, further confirm our results. The insights from this study bear valuable conclusions for P2P lending platforms to further improve their scoring systems with graph-enhanced metrics, thereby reducing default risk and facilitating greater access to credit.
Keywords: Peer-to-Peer-lending; Credit default prediction; Machine Learning; Network centrality
Raw data:
The raw dataset was downloaded on April 22nd, 2022, as a part of Bandora’s daily updated public report.3 Loan starting dates span from June 16th, 2009, to April 21st, 2022. The original dataset covers 231,039 individual borrowers characterized through 112 categorical and continuous variables. Among these loans, 79,424 have been recorded with delayed interest payments according to the platform, while 151,615 loans have no recorded delay on interest payments before the download date of the data. Specifically, the dataset details borrower demographics, financial attributes, and past credit market interactions.
The raw dataset cannot be made public due to the restrictions of the Bondora platform (https://bondora.com/en/terms/):
13.4 The Portal, Portal's website and the copyright of the contents thereof belong to the Company. The User does not have the right to save, copy, change, transfer, forward or disclose the pages of the Portal for a purpose other than personal use.
Data cleaning:
Bondora.R
1_data_washing.ipynb
Thes two files cleans the data, as described in the paper Section 4.
Data metadata:
Description cleaned.xlsx
This file describes the meaning of features in the cleaned dataset.
The cleaned dataset, as second-hand data, can also not be made public due to the restriction of the platform.
Modeling:
The remaining notebooks.
These notebooks generate the results presentated in the paper.
Files
1_data_washing.ipynb
Files
(24.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:c232a80cc95dd5863af5d9b57ff90902
|
143.5 kB | Preview Download |
|
md5:577a74ae7dc0ca93d39968b117b8c995
|
53.8 kB | Preview Download |
|
md5:2a694cbb3efda1165ef25d33b3c4ed45
|
380.5 kB | Preview Download |
|
md5:0768c8a86183c22709c788fbe9181355
|
6.2 MB | Preview Download |
|
md5:83652b4cd1379d7e34dc585039651ba4
|
2.3 MB | Preview Download |
|
md5:36f5eebe16a4b74a35176407968d9fcb
|
14.9 MB | Preview Download |
|
md5:007aaf4d8c2755bb8c6f51f34268f48c
|
331.8 kB | Preview Download |
|
md5:89d34a85ed3e6ae78f61b2d252889620
|
29.5 kB | Download |
|
md5:7a79ddebd1f75985722deecd5fc88d76
|
24.6 kB | Download |
Additional details
Funding
- Swiss National Science Foundation
- 100019𝐸_205487
- European Cooperation in Science and Technology
- COST Action CA19130