pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs
Creators
- 1. Novartis
- 2. Purdue University
- 3. Oxford Big Data Institute
Description
Synthetic dataset for A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs
Dataset specification:
- MRI images of Vertebral Units labelled based on region
- Dataset is comprised of 10000 pairs of images and labels
- Image and label pair number k can be selected by: synthetic_dataset['images'][k] and synthetic_dataset['regions'][k]
- Images are 3D of size (9, 64, 64)
- Regions are stored as an integer. Mapping is 0: cervical, 1: thoracic, 2: lumbar
Arxiv paper: https://arxiv.org/abs/2106.13199
Github code: https://github.com/tcoroller/pGAN/
Abstract:
Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data but preserves privacy. As part of the collaboration between Novartis and the Oxford Big Data Institute, we generate a synthetic dataset based on COSENTYX Ankylosing Spondylitis (AS) clinical study. We apply an Auxiliary Classifier GAN (ac-GAN) to generate synthetic magnetic resonance images (MRIs) of vertebral units (VUs). The images are conditioned on the VU location (cervical, thoracic and lumbar). In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.
Files
Files
(2.9 GB)
Name | Size | Download all |
---|---|---|
md5:cb18474edbce34713ec0424197a9ff73
|
2.9 GB | Download |