Published February 11, 2022 | Version 2.0
Report Open

Use of NHS Digital datasets as trial data in the UK: a position paper

  • 1. MRC Clinical Trials Unit at University College London; Health Data Research UK; NHS DigiTrials Programme, NHS Digital
  • 2. NHS DigiTrials Programme, NHS Digital
  • 3. Nuffield Department of Population Health, University of Oxford; Health Data Research UK
  • 4. University of Leeds; Data Services Directorate, NHS Digital
  • 5. MRC Clinical Trials Unit at University College London; London School of Hygiene and Tropical Medicine, University of London; Health Data Research UK
  • 6. Nuffield Department of Population Health, University of Oxford; Health Data Research UK; NHS DigiTrials Programme, NHS Digital
  • 7. MRC Clinical Trials Unit at University College London; Health Data Research UK
  • 8. MRC Clinical Trials Unit at University College London; Health Data Research UK; BHF Data Science Centre

Description

Background: Clinical trial teams increasingly want to make use of data from healthcare systems (“healthcare data”), particularly to enhance recruitment and follow-up of participants, to reduce time and cost, and to stop the duplication of effort. However, there is continued uncertainty of how regulators regard healthcare data used for trial purposes, in terms of provenance, quality and reliability.

Objectives: There were two key objectives: First, to demonstrate the data integrity of two datasets held by NHS Digital (NHSD) that are most requested by trial teams; and second, to set out an approach by which any other healthcare systems datasets can be similarly evaluated.

Method: The data lifecycles of the datasets were carefully documented, mapping the flow of data from the originating healthcare provider’s databases to NHSD warehouses and onwards to clinical trials teams. These were assessed for evidence of whether the datasets are accurate, reliable, complete, contemporaneous, and well-governed.

Result: The assessment method was applied to (a) the Hospital Episode Statistics Admitted Patient Care (HES APC) dataset and (b) the Civil Registration of Deaths (CRD) dataset. This paper clearly demonstrates that their collection and management through NHSD systems ensure their integrity and reliability. The datasets are accurate representations of the data held by the originating providers (acute NHS trusts and local registrars).

Conclusion: Based on these findings, the HES APC and CRD datasets satisfy the assessment criteria that demonstrate they are reliable transcribed copies of the original source data.

Implications: First, these datasets can be used directly for clinical trial data, with trial teams focusing on the accuracy of algorithms and processes to identify particular outcomes rather than on the integrity of the data flow. Second, this assessment approach should be used to assess whether other healthcare systems datasets are ready to be used as transcribed copies of source data, and for data providers to take appropriate steps to redress this matter if they are not.

Notes

On behalf of the Healthcare Systems Data for Clinical Trials Collaborative Group. MLM is funded by HDR UK. SBL, JRC, MKBP, and MRS are funded by MRC grant MC_UU_00004/08.

Files

Position paper on NHSD data v2.0 20220211 final.pdf

Files (1.2 MB)

Additional details

Related works

Is supplemented by
10.5281/zenodo.6047938 (DOI)