Published September 9, 2024 | Version v1
Dataset Open

GraSCCo_PHI - Graz Synthetic Clinical text Corpus with Protected Health Information Annotations

Description

GraSCCo_PHI - Graz Synthetic Clinical text Corpus with Protected Health Information Annotations

GraSCCo  is a collection of artificially generated semi-structured and unstructured German-language clinical summaries. These summaries are formulated as letters from the hospital to the patient's GP after in-patient or out-patient care. Details:

  • Stefan Schulz. (2022). GraSCCo (Version v1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6539131
  • Modersohn L, Schulz S, Lohr C, Hahn U. GRASCCO - The First Publicly Shareable, Multiply-Alienated German Clinical Text Corpus. Stud Health Technol Inform. 2022;296:66-72. doi:10.3233/SHTI220805

This is the GraSSCo with annotations of Proteced Health Information as an external source of 

  • Lohr C, Matthies F, Faller J, et al. De-Identifying GRASCCO - A Pilot Study for the De-Identification of the German Medical Text Project (GeMTeX) Corpus. Stud Health Technol Inform. 2024;317:171-179. doi:10.3233/SHTI240853 (https://pubmed.ncbi.nlm.nih.gov/39234720/)

This repository contains the annotations in XMI and JSON exports created with the INCEpTION annotation platform (https://inception-project.github.io/), also the annotation guideline document, TypeSystem.xml and layer.json (needed for import in INCEpTION).

Files

_Annoguide____GeMTeX___DeID.pdf

Files (2.7 MB)

Name Size Download all
md5:21b296e42c0e96a3d16c587281d23e26
920.4 kB Preview Download
md5:116c9bde2b94b78427639bf2358531ae
857.9 kB Preview Download
md5:066707894b0dd39d269537f6d176a0eb
814.2 kB Preview Download
md5:68356e5b132073e27a0637b2d927242c
4.7 kB Preview Download
md5:14de000838dc5e201c7b1303566fb927
132.6 kB Preview Download