Published July 29, 2020 | Version v1
Conference paper Open

The Data First Programme: challenges of quantifying complex problems for justice research

  • 1. Ministry of Justice
  • 2. Nottingham Trent University and Academic Lead Data First Programme (ADR UK)

Description

The Ministry of Justice (MoJ) has received funding from ADR UK (Administrative Data Research UK) for an ambitious programme of work called Data First, which aims to improve the quality and accessibility of the department’s data to enable better research.  The programme will improve the data made available in four ways: utilising modern data pipelines to extract data efficiently from management information systems, maximising the amount of collected information  that comes into scope for analysis; applying record linkage techniques to allow user journeys to be understood; and creating a partnership between MoJ and academics to explore data quality and develop research-ready datasets that are structured effectively to facilitate key academic research.

 

As part of this work, a new team has been established to develop and implement cutting edge approaches to data linkage at scale. This work led to the release of a new piece of probabilistic matching software called Splink, which is able to handle data at the scale of MoJ’s large administrative datasets. So far, this software has been used to deduplicate data from the magistrates’ courts, one of the MoJ’s largest administrative datasets. The probabilistic approach enables the uncertainty of the link to be quantified, which is expected to lead to more robust research application. The MoJ administrative data-linking work described in the paper enables the Data First programme to make new, higher quality datasets available to researchers.

Files

Files (50.0 kB)

Name Size Download all
md5:68c62fb0dacf46093e41befdf4e65cad
50.0 kB Download