How to Deal with Privacy, Bias & Drift in Synthetic Primary Care Data

Allan Tucker

doi:10.5281/zenodo.8136457

Published July 11, 2023 | Version v1

Journal article Open

How to Deal with Privacy, Bias & Drift in Synthetic Primary Care Data

Allan Tucker¹

1. Brunel University London

Primary healthcare care data offers huge value in modelling disease and illness. However, this data holds extremely private information about individuals and privacy concerns continue to limit the wide-spread use of such data, both by public research institutions and by the private health-tech sector.

One possible solution is the use of synthetic data which mimics the underlying correlational structure and distributions of real data but avoids many of the privacy concerns. Brunel University London has been working in a long-term collaboration with the Medicine and Healthcare products Regulatory Agency in the UK to construct a high-fidelity synthetic data service using probabilistic models with complex underlying latent variable structures.

This work has led to multiple releases of synthetic data on a number of diseases including covid and cardiovascular disease, which are available for research. Two major issues that have arisen from our synthetic data work are issues with bias, even when working with comprehensive national data, and with concept drift where subsequent batches of data move away from current models and what impact this may have on regulation.

In this talk Allan Tucker discusses some of the key results of the collaboration: on his experiences of synthetic data generation, on the detection of bias and how to better represent the true underlying UK population, and how to handle concept drift when building models of healthcare data that evolves over time

Files

Turing_v1.pdf

Files (4.2 MB)

Name	Size	Download all
Turing_v1.pdf md5:03a81dd99f07ec16ccb694f4605e0408	4.2 MB	Preview Download

	All versions	This version
Views	70	69
Downloads	89	89
Data volume	380.4 MB	380.4 MB

How to Deal with Privacy, Bias & Drift in Synthetic Primary Care Data

Creators

Description

Files

Turing_v1.pdf

Files (4.2 MB)