Presentation Open Access

scorecal - Empirical score calibration under the microscope

Gayler, Ross W.

Presentation given at the Credit Scoring & Credit Control Conference XVI in Edinburgh, UK.


Score calibration is the process of empirically determining the relationship between a score and an outcome on some population of interest, and scaling is the process of expressing that relationship in agreed units. Calibration is often treated as a simple matter and attacked with simple tools – typically, either assuming the relationship between score and log-odds is linear and fitting a logistic regression with the score as the only covariate, or dividing the score range into bands and plotting the empirical log-odds as a function of score band.

Both approaches ignore some information in the data. The assumption of a linear score to log-odds relationship is too restrictive and score banding ignores the continuity of the scores. While a linear score to log-odds relationship is often an adequate approximation, the reality can be much more interesting, with noticeable deviations from the linear trend. These deviations include large-scale non-linearity, small-scale non-monotonicity, discrete discontinuities, and complete breakdown of the linear trend at extreme scores.

Detecting these effects requires a more sophisticated approach to empirically determining the score to outcome relationship. Taking a more sophisticated approach can be surprisingly tricky: the typically strong linear trend can obscure smaller deviations from linearity; detecting subtle trends requires exploiting the continuity of the scores, which can obscure discrete deviations; trends at extreme scores (out in the data-sparse tails of the distribution of scores) can be obscured by trends at less extreme scores (where there is more data); score distributions with some specific values that are relatively common can disrupt methods relying on continuity; and any modelling technique can introduce its own biases.

Over the years I have developed a personal approach to these issues in score calibration and implemented them as an open source, publicly accessible R package for score calibration. I discuss these technical issues in empirical score calibration and show how they are addressed in the scorecal package.


This presentation is generated by an executable R notebook which generates simulated data, performs the analyses for the plots, and formats the results and text as the slides of the presentation. This notebook is publicly accessible so that readers can experiment with the analyses.

The notebook is available on GitHub at:  

The GitHub repository is archived on Zenodo at:

Cloud execution

The software has been set up so that it can be executed in the cloud for free. This means that the reader is able to experiment with the software via a web browser without having to install any software locally. Instructions for executing the notebook are in the GitHub repository and the Zenodo archive.

Files (370.8 kB)
Name Size
370.8 kB Download
All versions This version
Views 387387
Downloads 227227
Data volume 84.2 MB84.2 MB
Unique views 348348
Unique downloads 193193


Cite as