Published December 19, 2025 | Version v1
Journal article Open

Leveraging Network Topology for Credit Risk Assessment in P2P Lending: A Comparative Study under the Lens of Machine Learning

  • 1. ROR icon University of Twente
  • 2. ROR icon Bern University of Applied Sciences

Description

Overview
This deposit provides the materials required to reproduce the empirical workflow, figures, and manuscript source for the study:

“Leveraging Network Topology for Credit Risk Assessment in P2P Lending: A Comparative Study under the Lens of Machine Learning.”

The study forms Chapter 3 of the doctoral dissertation “Risk Management in Digital Finance: Assessment and Pricing in an Emerging Fintech Era” by Lennart John Baals and is published as:

Liu, Y., Baals, L. J., Osterrieder, J., & Hadji-Misheva, B. (2024). Leveraging network topology for credit risk assessment in P2P lending: A comparative study under the lens of machine learning. Expert Systems with Applications, 252, 124100.

This deposit contains (i) the LaTeX source of the thesis chapter, (ii) Jupyter notebooks implementing data preprocessing, network construction, model training/evaluation, and explainability analyses, and (iii) figure outputs summarizing descriptive statistics, feature importance, ROC performance, and network-centrality characteristics.

Contents of this deposit (file-level summary)

  • Manuscript / thesis chapter source: main_WP3_PhD_Lennart_Baals.tex, bibliography files (e.g., reference_paper_1.bib), and formatting assets (e.g., apa.bst).

  • Jupyter notebooks (analysis pipeline):

    • Preprocessing & feature engineering: 0.1_data_preprocessing.ipynb, 0.2_descriptive_statistics.ipynb

    • Model training and evaluation workflow: 1_2023.01.05 Data_Pre-processing_&_Models_training.ipynb, 2_2023.07.05_Models_analysis.ipynb, 3_2023.06.28 Model Re-training and testing.ipynb, 4_2023.07.05_Models_analysis.ipynb

    • Explainability: 4_2023.06.28 SHAP Explainability.ipynb

    • Additional experimentation / automation: 2023.04.22 SNF P2P Credit Risk Auto ML.ipynb

  • Key figures / outputs (PDF):

    • Descriptive statistics of raw variables (e.g., interest rate, loan amount, borrower characteristics, prior-loan measures): descriptive_stats_raw_data_full (...).pdf

    • Network-centrality descriptive statistics: descriptive_stats_pagerank.pdf, descriptive_stats_betweenness.pdf, descriptive_stats_closeness.pdf, descriptive_stats_katz.pdf, descriptive_stats_authority.pdf, descriptive_stats_hub.pdf

    • Model performance summaries: all_model_roc_curves.pdf

    • Feature importance / model diagnostics: rf_feature_importance.pdf, glm_feature_importance.pdf, dl_feature_importance.pdf, plus best-model summaries such as RF (best_result).pdf, GLM (best_result).pdf, DL (best_result).pdf

Methodological summary (what the code produces)
The workflow implements a network-enhanced credit risk assessment framework for P2P lending. In brief, the analysis:

  • constructs a borrower/loan similarity graph using origination-time information and derives network representations,

  • extracts multiple centrality measures (e.g., PageRank, betweenness, closeness, Katz, hub/authority) as additional predictors that encode structural information about similarity-based borrower position,

  • trains and compares several machine-learning models for default prediction (including linear baselines and non-linear learners), and

  • evaluates predictive performance using standard classification metrics and ROC curves, complemented by feature-importance and SHAP-based explainability analyses.

The included outputs summarize both the distributional characteristics of the raw data and the incremental predictive value of network-topology features across model classes.

Data sources and access conditions
The empirical component relies on loan-level P2P lending data (e.g., platform data such as Bondora and/or comparable sources, depending on the chapter configuration). Redistribution may be restricted by data-provider terms and privacy constraints. This deposit therefore emphasizes code, documentation, and figure outputs. Users intending to fully reproduce all results should obtain the underlying raw data from the original provider(s) under their own access rights and then apply the provided preprocessing and variable mapping steps as documented in the notebooks.

Any included data descriptions are intended to facilitate transparent replication while respecting the applicable redistribution constraints.

Reproducibility (how to run)
A typical reproduction path is:

  1. Run the preprocessing notebooks (0.1_data_preprocessing.ipynb, 0.2_descriptive_statistics.ipynb) to generate cleaned features and descriptive tables.

  2. Execute the training/evaluation notebooks (1_..., 2_..., 3_..., 4_...) to reproduce model estimation, ROC curves, and feature-importance outputs.

  3. Run the explainability notebook (4_2023.06.28 SHAP Explainability.ipynb) to reproduce SHAP summaries and interpretability results.

  4. Compile the thesis chapter from main_WP3_PhD_Lennart_Baals.tex (using the included bibliography/style assets) if you wish to regenerate the manuscript PDF.

Intended use
This deposit is intended for:

  • replication of the published results (subject to data access constraints),

  • reuse of the similarity-graph + centrality-feature construction approach for other P2P or retail-credit datasets, and

  • benchmarking of network-enhanced models against conventional credit-scoring baselines.

Licensing and reuse
Unless otherwise noted within individual files, the intent is to enable reuse for academic and non-commercial research with appropriate attribution. If different licenses apply to code vs. manuscript text/figures, this should be reflected in the record license choice.

Files

0.1_data_preprocessing.ipynb

Files (24.6 MB)

Name Size Download all
md5:815364d475d4e45b51070a347d30c851
169.3 kB Preview Download
md5:aefc4e8266e804a5bbbcb81d10cbe6a9
43.5 kB Preview Download
md5:da5381494952aeec1e2c9e705310ce51
192.7 kB Preview Download
md5:e436e3a5fbe16b95118408880c1afaf9
555.9 kB Preview Download
md5:1989f6083ad01c8112b722cdf2265961
6.7 MB Preview Download
md5:85b54adbf6ea001a0f5d77cd9e0129cc
492.3 kB Preview Download
md5:007aaf4d8c2755bb8c6f51f34268f48c
331.8 kB Preview Download
md5:ad24c1fb76867297119114cba97f0a91
14.9 MB Preview Download
md5:8d4de6cd8a0ad03dc0610ad7ff9609e7
24.6 kB Preview Download
md5:2901ede3aa33932bb95d49a3744ed373
23.6 kB Download
md5:b0032ba4d26aa40bab3d3d9378770f33
43.9 kB Preview Download
md5:f29cbba74121ac7131a3e973559c641c
43.9 kB Preview Download
md5:f63ae80dab5c73da170f5c2633f6a5ed
45.1 kB Preview Download
md5:79b90fca8dc9f78c8b22931f53829dff
37.8 kB Preview Download
md5:ea6cb20790191aae2990ed165ce8b16a
49.9 kB Preview Download
md5:1352af4493a08d4c3cf1ff78c6cd0208
53.9 kB Preview Download
md5:802fef235b78d70bbf9355cdc7813ccf
32.3 kB Preview Download
md5:d22f04c3bb9780efdc567a1bb9aa3491
48.6 kB Preview Download
md5:22b90d296598f09f9c1b6f2eaf6a3938
39.5 kB Preview Download
md5:7b4840b75a5503bd4485011853dd037a
32.2 kB Preview Download
md5:0c4aa6dbcb92a8aefe4acfa78075324d
37.6 kB Preview Download
md5:ca5a04c5b14c3f0b86fbdb6bc72f1c12
40.5 kB Preview Download
md5:4ce90a6989e81a408182269739b0883b
36.7 kB Preview Download
md5:acbc22266b46165fd1ac619fad65b2ab
41.3 kB Preview Download
md5:bccd4ff80c76a0dc4fb83f8cb7d6f8cf
33.1 kB Preview Download
md5:543a7dd50cfcb9808ed85750581bb080
26.5 kB Preview Download
md5:75c53d4f63ed40b904205e89f71af3a9
15.8 kB Preview Download
md5:72a6b65c7e61cabbd31effd5a81ef671
26.6 kB Preview Download
md5:3a752217252a7105dadf46f10fb82cb7
16.2 kB Preview Download
md5:200b308dd535f9edeec1dc21165123df
106.9 kB Download
md5:8572b4f6a2914764064b9ad85caa6d28
217.0 kB Download
md5:da725b5f52173eb588cc6d8e0020bfab
26.3 kB Preview Download
md5:b1dfcb8be06c9fb0892cbc994e4748b8
16.2 kB Preview Download

Additional details

Related works

Is published in
Publication: 10.1016/j.eswa.2024.124100 (DOI)

Funding

Swiss National Science Foundation
Network-based credit risk models in P2P lending markets 100019E − 205487

Dates

Available
2024-10-15
Published in Expert Systems with Applications

Software

Programming language
Python