Published June 25, 2021 | Version v1
Dataset Open

pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

Description

Synthetic dataset for A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

 Dataset specification:

  • MRI images of Vertebral Units labelled based on region
  • Dataset is comprised of 10000 pairs of images and labels
  • Image and label pair number k can be selected by: synthetic_dataset['images'][k] and synthetic_dataset['regions'][k]
  • Images are 3D of size (9, 64, 64)
  • Regions are stored as an integer. Mapping is 0: cervical, 1: thoracic, 2: lumbar

Arxiv paper: https://arxiv.org/abs/2106.13199
Github code: https://github.com/tcoroller/pGAN/

Abstract:

Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data but preserves privacy. As part of the collaboration between Novartis and the Oxford Big Data Institute, we generate a synthetic dataset based on COSENTYX Ankylosing Spondylitis (AS) clinical study. We apply an Auxiliary Classifier GAN (ac-GAN) to generate synthetic magnetic resonance images (MRIs) of vertebral units (VUs). The images are conditioned on the VU location (cervical, thoracic and lumbar). In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.

Files

Files (2.9 GB)

Name Size Download all
md5:cb18474edbce34713ec0424197a9ff73
2.9 GB Download