Published September 7, 2025 | Version v2
Dataset Open

GraSCCo_PII_V2 - Graz Synthetic Clinical text Corpus with PII Annotations

Description

GraSCCo_PII_V2 - Graz Synthetic Clinical text Corpus with Personally Identifiable Informations

Additional source of:

  • Lohr C, Faller J, Riedel A, Nguyen HM, Wolfien M, Hofenbitzer J, Modersohn L, Romberg J, Prasser F, Omeirat J, Wen Y, Galusch O, Hahn U, Seiferling M, Dieterich C, Klügl P, Matthies F, Kind J, Boeker M, Löffler M, Meineke F. GeMTeX's De-Identification in Action: Lessons Learned & Devil's Details. Stud Health Technol Inform. 2025 Sep 3;331:274-282. doi: 10.3233/SHTI251406. PMID: 40899551. (https://pubmed.ncbi.nlm.nih.gov/40899551/)

GraSCCo  is a collection of artificially generated semi-structured and unstructured German-language clinical summaries. These summaries are formulated as letters from the hospital to the patient's GP after in-patient or out-patient care. Details:

  • Stefan Schulz. (2022). GraSCCo (Version v1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6539131
  • Modersohn L, Schulz S, Lohr C, Hahn U. GRASCCO - The First Publicly Shareable, Multiply-Alienated German Clinical Text Corpus. Stud Health Technol Inform. 2022;296:66-72. doi:10.3233/SHTI220805

First Version - GraSSCo with PHI (now named as PII) annotations as external source of

  • Lohr C, Matthies F, Faller J, et al. De-Identifying GRASCCO - A Pilot Study for the De-Identification of the German Medical Text Project (GeMTeX) Corpus. Stud Health Technol Inform. 2024;317:171-179. doi:10.3233/SHTI240853 (https://pubmed.ncbi.nlm.nih.gov/39234720/)

This repository contains the annotations in XMI and JSON exports created with the INCEpTION annotation platform (https://inception-project.github.io/), also the annotation guideline document, TypeSystem.xml and layer.json (needed for import in INCEpTION), see also https://github.com/dkpro/dkpro-cassis.

Files

GeMTeX_Annoguide_DeID_2_202509.pdf

Files (2.4 MB)

Name Size Download all
md5:925d39cffff9499514ae29a8b4d632c4
700.3 kB Preview Download
md5:8c640f358f028e8d0615f2b814b18d19
5.4 kB Preview Download
md5:ce9444e76c246c7562f9384e72d73e46
143.6 kB Preview Download
md5:472112d9a8177715bc5ad3fe520b9720
827.6 kB Preview Download
md5:d8a67c976696b207941ac49d9e867dd1
746.1 kB Preview Download