Pixel-level Protected Health Information (PHI) - Supplement to Exploring AI-Based System Design for Pixel-Level Protected Health Information Detection in Medical Images
Authors/Creators
Description
This dataset includes two collections: RadPHI-test and MIDI. RadPHI-test and MIDI are derived datasets created by overlaying synthetically generated text on publicly available medical imaging datasets. All source images originate from open-access resources cited in the References section [1]–[6]. These datasets were further processed to generate synthetic imprints representing Protected Health Information (PHI) categories for research on medical image de-identification and related tasks.
If you plan to use this dataset, please cite the following paper:
Truong, T., Baltruschat, I.M., Klemens, M. et al. Exploring AI-Based System Design for Pixel-Level Protected Health Information Detection in Medical Images. J Digit Imaging. Inform. med. (2025). https://doi.org/10.1007/s10278-025-01619-y
References
[1] Wasserthal J, Breit HC, Meyer MT, Pradella M, Hinck D, Sauter AW, Heye T, Boll DT, Cyriac J, Yang S, et al.: TotalSegmentator: robust segmentation of 104 anatomic structures in CT images. Radiol Artif Intell 5(5), 2023.
[2]Huang Z, Pu X, Tang G, Ping M, Jiang G, Wang M, Wei X, Ren Y: BS-80K: The first large open-access dataset of bone scan images. Comput Biol Med 151:106221, 2022.
[3] Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM: ChestX-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2097–2106, 2017.
[4]Antonelli M, Reinke A, Bakas S, Farahani K, Kopp-Schneider A, Landman BA, Litjens G, Menze B, Ronneberger O, Summers RM, et al.: The medical segmentation decathlon. Nat Commun 13(1):4128, 2022.
[5] Farahani K, Clunie D, Klenk J, Kopchick B, Diaz M, Pan Q, Pei L, Prior F, Rutherford M, Singh A, Sutton G, Wagner U: Medical Image De-Identification Benchmark (MIDI-B). Available at https://www.synapse.org/Synapse:syn53065760 Accessed 16 April 2025.
[6] Rutherford MW, Nolan T, Pei L, Wagner U, Pan Q, Farmer P, Smith K, Kopchick B, Opsahl-Ong L, Sutton G, Clunie DA, Farahani K, Prior F: Data in support of the MIDI-B Challenge (MIDI-B-Synthetic-Validation, MIDI-B-Curated-Validation, MIDI-B-Synthetic-Test, MIDI-B-Curated-Test) (Version 1) [Data set]. The Cancer Imaging Archive, https://doi.org/10.7937/cf2p-aw56, 2025
Files
data.zip
Files
(337.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:e0500555301475e89c9fe0d6bc35086a
|
337.2 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Journal: 10.1007/s10278-025-01619-y (DOI)