Published October 30, 2023 | Version v1
Report Open

SARA: Semi-Automated Risk Assessment of Data Provenance and Clinical Free-text in trusted research environments

  • 1. ACRC, Usher Institute, University of Edinburgh
  • 2. DataLoch, Usher Institute, University of Edinburgh
  • 3. ROR icon University of Edinburgh
  • 4. Grampian Data Safe Haven (DasH)
  • 5. ROR icon University of Aberdeen
  • 6. West of Scotland Safe Haven, NHS Glasgow
  • 7. eDRIS, Public Health Scotland
  • 8. ROR icon Brighton and Sussex Medical School

Description

Data are transforming health and social care, enabling life-changing discoveries, advancing healthcare services, and improving lives. Yet, health data providers face challenges in extracting and linking these complex data and safeguarding its release for research. Risk assessments are key to ensure that data access does not pose privacy risks, such as containing identifiable patient information or that patients' records are processed correctly. Current processes are ad-hoc, manual, and time-consuming and can prohibit data access, ultimately limiting health and social care innovation. 

This project focused on delivering semi-automated tools and exploring approaches to improve two areas of risk assessment and monitoring: Data provenance, improving the trustworthiness of data ingestion, processing and linking, making sure it is compliant for research; and Privacy assessment, minimising the risk of identifiable information in clinical free-text records (e.g., GP letters, discharge summaries). Public Involvement and Engagement (PIE) were central to our project by ensuring risks were properly identified and addressed, public perspectives were embedded into our outcomes, and our methods were transparent and understandable. 

The project delivered: 

• a comprehensive report on our public involvement and engagement and demonstration of incorporating public views into our design for both data provenance and privacy risk understanding of clinical free-text. 

• an approach to explore and understand privacy risks in clinical free-text, which could be applied to future data in other TREs, along with details on the privacy risk categories. 

• a visualisation dashboard for exploring privacy risk in clinical free-text.

• an open-source framework for data provenance tracing within TREs, which covers the full data production workflow. 

• a front-end dashboard that allows TRE analysts, researchers, and information governance teams to inspect each step of the data workflow for quality assurance and to improve transparency in how data was produced.

Our collaborative team of TREs and academic partners developed approaches and an implementable toolkit for all health data providers across the UK, which address possible public concerns. Our tools provide significant benefits for data audit and release, with the potential to improve consistency between organisations, improve data accessibility for researchers and promote access to data that has not been feasible before. 

This work was funded by UK Research & Innovation MC_PC_23005 as part of Phase 1 of the DARE UK (Data and Analytics Research Environments UK) programme, delivered in partnership with Health Data Research UK (HDR UK) and Administrative Data Research UK (ADR UK). 

Files

SARA_Final_Report.pdf

Files (1.3 MB)

Name Size Download all
md5:ccb35838d67bf18e26698d50b9df3756
1.3 MB Preview Download