Leveraging Network Topology for Credit Risk Assessment in P2P Lending: A Comparative Study under the Lens of Machine Learning
Authors/Creators
Description
Overview
This deposit provides the materials required to reproduce the empirical workflow, figures, and manuscript source for the study:
“Leveraging Network Topology for Credit Risk Assessment in P2P Lending: A Comparative Study under the Lens of Machine Learning.”
The study forms Chapter 3 of the doctoral dissertation “Risk Management in Digital Finance: Assessment and Pricing in an Emerging Fintech Era” by Lennart John Baals and is published as:
Liu, Y., Baals, L. J., Osterrieder, J., & Hadji-Misheva, B. (2024). Leveraging network topology for credit risk assessment in P2P lending: A comparative study under the lens of machine learning. Expert Systems with Applications, 252, 124100.
This deposit contains (i) the LaTeX source of the thesis chapter, (ii) Jupyter notebooks implementing data preprocessing, network construction, model training/evaluation, and explainability analyses, and (iii) figure outputs summarizing descriptive statistics, feature importance, ROC performance, and network-centrality characteristics.
Contents of this deposit (file-level summary)
-
Manuscript / thesis chapter source:
main_WP3_PhD_Lennart_Baals.tex, bibliography files (e.g.,reference_paper_1.bib), and formatting assets (e.g.,apa.bst). -
Jupyter notebooks (analysis pipeline):
-
Preprocessing & feature engineering:
0.1_data_preprocessing.ipynb,0.2_descriptive_statistics.ipynb -
Model training and evaluation workflow:
1_2023.01.05 Data_Pre-processing_&_Models_training.ipynb,2_2023.07.05_Models_analysis.ipynb,3_2023.06.28 Model Re-training and testing.ipynb,4_2023.07.05_Models_analysis.ipynb -
Explainability:
4_2023.06.28 SHAP Explainability.ipynb -
Additional experimentation / automation:
2023.04.22 SNF P2P Credit Risk Auto ML.ipynb
-
-
Key figures / outputs (PDF):
-
Descriptive statistics of raw variables (e.g., interest rate, loan amount, borrower characteristics, prior-loan measures):
descriptive_stats_raw_data_full (...).pdf -
Network-centrality descriptive statistics:
descriptive_stats_pagerank.pdf,descriptive_stats_betweenness.pdf,descriptive_stats_closeness.pdf,descriptive_stats_katz.pdf,descriptive_stats_authority.pdf,descriptive_stats_hub.pdf -
Model performance summaries:
all_model_roc_curves.pdf -
Feature importance / model diagnostics:
rf_feature_importance.pdf,glm_feature_importance.pdf,dl_feature_importance.pdf, plus best-model summaries such asRF (best_result).pdf,GLM (best_result).pdf,DL (best_result).pdf
-
Methodological summary (what the code produces)
The workflow implements a network-enhanced credit risk assessment framework for P2P lending. In brief, the analysis:
-
constructs a borrower/loan similarity graph using origination-time information and derives network representations,
-
extracts multiple centrality measures (e.g., PageRank, betweenness, closeness, Katz, hub/authority) as additional predictors that encode structural information about similarity-based borrower position,
-
trains and compares several machine-learning models for default prediction (including linear baselines and non-linear learners), and
-
evaluates predictive performance using standard classification metrics and ROC curves, complemented by feature-importance and SHAP-based explainability analyses.
The included outputs summarize both the distributional characteristics of the raw data and the incremental predictive value of network-topology features across model classes.
Data sources and access conditions
The empirical component relies on loan-level P2P lending data (e.g., platform data such as Bondora and/or comparable sources, depending on the chapter configuration). Redistribution may be restricted by data-provider terms and privacy constraints. This deposit therefore emphasizes code, documentation, and figure outputs. Users intending to fully reproduce all results should obtain the underlying raw data from the original provider(s) under their own access rights and then apply the provided preprocessing and variable mapping steps as documented in the notebooks.
Any included data descriptions are intended to facilitate transparent replication while respecting the applicable redistribution constraints.
Reproducibility (how to run)
A typical reproduction path is:
-
Run the preprocessing notebooks (
0.1_data_preprocessing.ipynb,0.2_descriptive_statistics.ipynb) to generate cleaned features and descriptive tables. -
Execute the training/evaluation notebooks (
1_...,2_...,3_...,4_...) to reproduce model estimation, ROC curves, and feature-importance outputs. -
Run the explainability notebook (
4_2023.06.28 SHAP Explainability.ipynb) to reproduce SHAP summaries and interpretability results. -
Compile the thesis chapter from
main_WP3_PhD_Lennart_Baals.tex(using the included bibliography/style assets) if you wish to regenerate the manuscript PDF.
Intended use
This deposit is intended for:
-
replication of the published results (subject to data access constraints),
-
reuse of the similarity-graph + centrality-feature construction approach for other P2P or retail-credit datasets, and
-
benchmarking of network-enhanced models against conventional credit-scoring baselines.
Licensing and reuse
Unless otherwise noted within individual files, the intent is to enable reuse for academic and non-commercial research with appropriate attribution. If different licenses apply to code vs. manuscript text/figures, this should be reflected in the record license choice.
Files
0.1_data_preprocessing.ipynb
Files
(24.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:815364d475d4e45b51070a347d30c851
|
169.3 kB | Preview Download |
|
md5:aefc4e8266e804a5bbbcb81d10cbe6a9
|
43.5 kB | Preview Download |
|
md5:da5381494952aeec1e2c9e705310ce51
|
192.7 kB | Preview Download |
|
md5:e436e3a5fbe16b95118408880c1afaf9
|
555.9 kB | Preview Download |
|
md5:1989f6083ad01c8112b722cdf2265961
|
6.7 MB | Preview Download |
|
md5:85b54adbf6ea001a0f5d77cd9e0129cc
|
492.3 kB | Preview Download |
|
md5:007aaf4d8c2755bb8c6f51f34268f48c
|
331.8 kB | Preview Download |
|
md5:ad24c1fb76867297119114cba97f0a91
|
14.9 MB | Preview Download |
|
md5:8d4de6cd8a0ad03dc0610ad7ff9609e7
|
24.6 kB | Preview Download |
|
md5:2901ede3aa33932bb95d49a3744ed373
|
23.6 kB | Download |
|
md5:b0032ba4d26aa40bab3d3d9378770f33
|
43.9 kB | Preview Download |
|
md5:f29cbba74121ac7131a3e973559c641c
|
43.9 kB | Preview Download |
|
md5:f63ae80dab5c73da170f5c2633f6a5ed
|
45.1 kB | Preview Download |
|
md5:79b90fca8dc9f78c8b22931f53829dff
|
37.8 kB | Preview Download |
|
md5:ea6cb20790191aae2990ed165ce8b16a
|
49.9 kB | Preview Download |
|
md5:1352af4493a08d4c3cf1ff78c6cd0208
|
53.9 kB | Preview Download |
|
md5:802fef235b78d70bbf9355cdc7813ccf
|
32.3 kB | Preview Download |
|
md5:d22f04c3bb9780efdc567a1bb9aa3491
|
48.6 kB | Preview Download |
|
md5:22b90d296598f09f9c1b6f2eaf6a3938
|
39.5 kB | Preview Download |
|
md5:7b4840b75a5503bd4485011853dd037a
|
32.2 kB | Preview Download |
|
md5:0c4aa6dbcb92a8aefe4acfa78075324d
|
37.6 kB | Preview Download |
|
md5:ca5a04c5b14c3f0b86fbdb6bc72f1c12
|
40.5 kB | Preview Download |
|
md5:4ce90a6989e81a408182269739b0883b
|
36.7 kB | Preview Download |
|
md5:acbc22266b46165fd1ac619fad65b2ab
|
41.3 kB | Preview Download |
|
md5:bccd4ff80c76a0dc4fb83f8cb7d6f8cf
|
33.1 kB | Preview Download |
|
md5:543a7dd50cfcb9808ed85750581bb080
|
26.5 kB | Preview Download |
|
md5:75c53d4f63ed40b904205e89f71af3a9
|
15.8 kB | Preview Download |
|
md5:72a6b65c7e61cabbd31effd5a81ef671
|
26.6 kB | Preview Download |
|
md5:3a752217252a7105dadf46f10fb82cb7
|
16.2 kB | Preview Download |
|
md5:200b308dd535f9edeec1dc21165123df
|
106.9 kB | Download |
|
md5:8572b4f6a2914764064b9ad85caa6d28
|
217.0 kB | Download |
|
md5:da725b5f52173eb588cc6d8e0020bfab
|
26.3 kB | Preview Download |
|
md5:b1dfcb8be06c9fb0892cbc994e4748b8
|
16.2 kB | Preview Download |
Additional details
Related works
- Is published in
- Publication: 10.1016/j.eswa.2024.124100 (DOI)
Funding
Dates
- Available
-
2024-10-15Published in Expert Systems with Applications
Software
- Programming language
- Python