There is a newer version of the record available.

Published September 25, 2025 | Version v18
Software Open

PROPENSITY SCORE MATCHING (PSM) PYTHON-BASED CODE

  • 1. EDMO icon University of Cantabria
  • 2. ROR icon Servicio Cántabro de Salud
  • 3. Instituto de Investigación Marqués de Valdecilla (IDIVAL)

Description

Summary 

This repository provides 4 variants of a free, Python-based code for performing propensity score (PS) matching. An initiative of the Camargo Cohort Study (Cantabria, Spain), developed with the aim of sharing the tool and spreading the use of PS matching.

The code overcomes compatibility issues with R versions and R packages, and implements (i) logistic regression to compute PS, (ii) 1:N matching using the K-nearest neighbour (KNN) algorithm with a customisable caliper, (iii) sampling with or without replacement, (iv) visualisations to assess matching quality and (v) statistics to evaluate the balance.

Outputs:

  • Matched pairs stored as '.csv' file..
  • Diagnostic plots stored in the specified output folder, providing a view of SMD and PS distribution.
  • Statistics for matching validation: SMD, variance ratio (VR), McFadden's pseudo-R², and L1 multivariate imbalance. Separately or being included in a Balance Assessment Report.

The code has been developed using information from the Matplotlib, Numpy and Seaborn libraries and with OpenAI's ChatGPT support and refinements. 

No funding was received for conducting this work and there are no financial or non-financial interests to disclose. 

 

Methodological notes on Propensity Score Matching (PSM)

These notes summarize key methodological insights derived from the implementation of a fully reproducible Python-based Propensity Score Matching (PSM) workflow within the Camargo Cohort. The objective is not only to document technical procedures, but also to clarify several structural and diagnostic aspects that emerged during applied use.

First, we describe a carry-over effect of the propensity score on non-included covariates. In our analyses, balance improvement was not restricted to variables explicitly included in the PS model. Standardized mean differences (SMDs) also decreased in certain correlated variables not directly modeled. This reflects the role of the PS as a low-dimensional representation of the multivariable structure underlying exposure assignment. While this phenomenon does not eliminate unmeasured confounding, it illustrates how a well-specified PS may capture broader latent susceptibility patterns within correlated clinical networks.

Second, we benchmarked the Python implementation against PSM performed in SPSS using established R-based packages. Although differences were observed in estimated PS values and specific matched pairs—likely due to algorithmic and computational variations—balance diagnostics were virtually identical across platforms. The concordant reduction in SMD confirms that balance performance, rather than intermediate numerical pathways, is the relevant methodological criterion.

Third, we introduce a structured Balance Assessment Report integrating covariate-level metrics (SMD, variance ratios) with global indicators (pseudo-R² and L1 imbalance) in a single automated output. While these diagnostics are individually well established, their combined and reproducible presentation aims to enhance transparency, methodological auditing, and ease of replication.

Together, these notes aim to support rigorous, transparent, and reproducible application of PSM in complex clinical research settings.

 

1. Carry-Over Effect of the Propensity Score on Non-Included Covariates

In our implementation of Propensity Score Matching (PSM), we observed that balance improvement was not limited to covariates explicitly included in the propensity score (PS) model. Standardized mean differences (SMDs) also decreased for certain variables that were not part of the logistic regression used to estimate the PS.

This phenomenon can be understood as a structural carry-over effect. The PS acts as a low-dimensional summary of the multivariable structure underlying exposure assignment. When upstream variables (e.g., age, BMI, cardiometabolic factors) are included in the model, they often represent central nodes within a correlated biological or clinical network.

If a non-included variable is correlated with included covariates, matching on the PS indirectly reduces imbalance in that variable through shared multivariable structure. This reflects propagation of balance across correlated dimensions of the covariate space rather than direct adjustment.

Importantly, this does not eliminate unmeasured confounding nor replace appropriate model specification. However, it suggests that a well-specified PS may capture broader latent susceptibility patterns beyond the explicitly modeled predictors (see SMD_Lineplot below)

 

2. Comparison between Python-based code and PSM performed by SPSS (based on R packages)

The Python-based implementation was benchmarked against a Propensity Score Matching (PSM) procedure conducted in SPSS using R-based packages (Propensity Score Matching for SPSS v1.0; Thoemmes F). Five covariates were used to estimate the propensity score (PS), applying a caliper of 0.20, 1:1 nearest-neighbour matching without replacement, and identical source data in both environments.

Although relevant discrepancies were observed in the estimated PS values and in the exact composition of the matched pairs, these differences are expected when distinct statistical engines are used. Variations may arise from differences in PS estimation routines (e.g., optimisation algorithms, convergence criteria, numerical precision), matching algorithms (tie-breaking rules, sorting procedures), and the operational implementation of calipers. Such divergence reflects algorithmic heterogeneity rather than methodological inconsistency.

Importantly, despite these differences at the individual-match level, balance diagnostics were virtually identical across implementations. As documented in the comparative file

Comparison_Rpackage_and_Python

, standardized mean differences (SMD) before and after matching showed equivalent reductions in imbalance across covariates. Because SMD is the most widely accepted and methodologically robust metric for assessing covariate balance, the concordant results confirm that the Python-based workflow achieves the same balance performance as the established R-based approach.

These findings indicate that the Python implementation is not merely technically functional but methodologically reliable, producing equivalent balance optimisation despite algorithmic variation in intermediate steps.

 

3. Balance Assessment

The structured Balance Assessment Report presented here was designed to provide an integrated and transparent diagnostic overview.

We developed a Balance Assessment script that calculates standardized mean differences (SMD), variance ratios (VR), pseudo-R² from the treatment assignment model, and the multivariate L1 imbalance metric. While each of these diagnostics is individually well established in the literature, automated integration within a single, reproducible report is less commonly available in standard workflows. This code provides a transparent and auditable structure that facilitates peer review, replication, and methodological scrutiny. By combining these complementary measures in a unified framework, this approach aims to facilitate methodological transparency, reproducibility, and ease of auditing, particularly in applied clinical research settings.

 

4. Beyond a matching tool

A related but distinct development by our group -the derivation of a prognostic tool (Fast Ossifier Stratification Index, FOSSI) from the propensity score framework- is described separately in another Zenodo document (https://zenodo.org/records/18600017).

Several studies have reported associations between PS values and disease severity, comorbidity burden, and subclinical dysfunction, suggesting that the PS may capture latent pathophysiologic risk beyond its original purpose. The PS encodes latent biological risk factors extending beyond its balancing function. Within current developments in PS methodology, the emerging field of causal-driven predictive modelling reinterprets the PS as a bridge between causal inference and clinical prediction.

 

Technical info

 

 

CODE

REPLACEMENT

CUSTOMISABLE RATIO AND CALIPER

MATCHED PAIRS

PSM ASSESSMENT

PS matching code 1

Without

 

Ratio: line 73

Caliper: line 84

.csv file

SMD (barplot and lineplot) (.png)

PS matching code 2

Without

 

Ratio: line 88

Caliper: line 89

.csv file

SMD, VR and pseudo-R² (.csv, .txt)

PS matching code 3

Without

 

Ratio: line 163

Caliper: line 168

.csv file

Lineplot with improvements (.png)

Balance report (SMD, VR, pseudo-R² and L1 imbalance) (.docx)

 

 

 

 

 

PS matching code 4

With

 

Ratio: line 89

Caliper: line 100

.csv file

SMD (barplot and lineplot) (.png)

 

Files

Propensity_Score_Distribution_Density.png

Files (233.9 kB)

Name Size Download all
md5:70b61fcb51f103e28b77676f4a86a1ac
55.5 kB Preview Download
md5:b05166e598e481957dfa4688a2a143eb
35.5 kB Preview Download
md5:7dc26901967f508ebf1c29af71012e76
80.1 kB Preview Download
md5:88a87b7ac0f257f8e89067919df9d1df
62.8 kB Preview Download

Additional details

Additional titles

Alternative title
SUMMARY

Dates

Updated
2025-03-03
Python-based code for implementing PSM

References

  • Staffa SJ, Zurakowski D. Five Steps to Successfully Implement and Evaluate Propensity Score Matching in Clinical Research Studies. Anesth Analg. 2018;127:1066-1073. doi: 10.1213/ANE.0000000000002787.
  • Thoemmes, F. Propensity score matching in SPSS. 2012. Available at: https://arxiv.org/pdf/1201.6385.
  • Stuart EA. Matching methods for causal inference: A review and a look forward. Stat Sci. 2010;25:1-21. doi: 10.1214/09-STS313.
  • Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T. Variable selection for propensity score models. Am J Epidemiol. 2006;163:1149-56. doi: 10.1093/aje/kwj149.
  • Austin PC. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivariate Behav Res. 2011 May;46(3):399-424. doi: 10.1080/00273171.2011.568786.
  • Zhang Z, Kim HJ, Lonjon G, Zhu Y; written on behalf of AME Big-Data Clinical Trial Collaborative Group. Balance diagnostics after propensity score matching. Ann Transl Med. 2019 Jan;7(1):16. doi: 10.21037/atm.2018.12.10.
  • Pariente E, Martín-Millán M, Maamar M, Pardo-Lledías J, Basterrechea H, Petitta B, Bianconi C, Ramos-Barrón C, Martínez-Taboada V, Hernández JL. Metabolic and osteogenic susceptibility in DISH: A prognostic index from propensity score modelling. Bone. 2026 Feb 4:117819. doi: 10.1016/j.bone.2026.117819. Epub ahead of print. PMID: 41651202.
  • Pariente-Rodrigo E, Martín-Millán M, Sgaramella G, Pardo-Lledías J, Fierro-Andrés P, Bonome M, Solares S, Ramos-Barrón C, Olmos-Martínez JM, Martínez-Taboada V, Hernández JL. 'Fast Ossifier' in diffuse idiopathic skeletal hyperostosis: a sex-modulated, heterogeneous phenotype with accelerated ossification and early trabecular decline. RMD Open. 2025 Sep 21;11(3):e006024. doi: 10.1136/rmdopen-2025-006024. PMID: 40983397; PMCID: PMC12458635.