GGPONC 2.0 (Major Release incl. Gold Standard Annotations)

Borchert, Florian; Lohr, Christina; Modersohn, Luise; Witt, Jonas; Langer, Thomas; Follmann, Markus; Gietzelt, Matthias; Arnrich, Bert; Hahn, Udo; Schapranow, Matthieu-P.

doi:10.5281/zenodo.12518458

Published March 24, 2024 | Version 2.0

Dataset Restricted

GGPONC 2.0 (Major Release incl. Gold Standard Annotations)

1. Hasso Plattner Institute
2. Friedrich Schiller University Jena
3. German Cancer Society
4. Medizinische Hochschule Hannover

About this Release

Version 2.0 (major release), contains gold-standard annotations described in the GGPONC 2.0 paper.

Once you have downloaded the data, the most convenient way to access them is through our Hugging Face BigBIO dataloader.

⚠️ More recent versions of GGPONC (with new / updated guidelines, but without gold-standard annotations) are published at:
https://zenodo.org/records/12520623

Project Description

The GGPONC project aims to provide a freely distributable corpus of German medical text for NLP researchers. Clinical guidelines are particularly suitable to create such corpora, as they contain no protected health information (PHI), which distinguishes them from other kinds of medical text.

The second major release of the corpus (GGPONC 2.0, 2024/03) consists of 30 German oncology guidelines with 1.87 million tokens. It has been completely manually annotated on the entity level by 7 medical students using the INCEpTION platform over a time frame of 6 months in more than 1200 hours of work. This makes GGPONC 2.0 the largest annotated, freely distributable corpus of German medical text at the moment.

Annotated entities are Findings (Diagnosis / Pathology, Other Finding), Substances (Clinical Drug, Nutrients / Body Substances, External Substances) and Procedures (Therapeutic, Diagnostic), as well as Specifications for these entities. In total, annotators have created more than 200000 entity annotations. In addition, fragment relationships have been annotated to explicitly indicate elliptical coordinated noun phrases, a common phenomenon in German text.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

Instructions

Please describe your research project briefly to gain access to GGPONC.

Terms of Use

GGPONC may be used for non-commercial research activities only.
GGPONC may not be distributed by the corpus users to any other third party (including any project collaborators). All prospective users of the corpus must apply for access individually.
The copyright of the corpus is protected in all parts. Any use outside of the Copyright Protection Law is not allowed and illegal without written permit of the German Guideline Program in Oncology (GGPO). No part of the corpus may be reproduced in any form without prior written permission of the GGPO.
GGPONC is provided free of charge.
The corpus comes with absolutely no warranties including (but not limited to) the correctness of the information provided in the text corpus itself. The latest version of the clinical guidelines used for the corpus can be found at: https://www.leitlinienprogramm-onkologie.de/english-language/
Contributions which are based on the corpus must cite the following publication:
- Florian Borchert, Christina Lohr, Luise Modersohn, Jonas Witt, Thomas Langer, Markus Follmann, Matthias Gietzelt, Bert Arnrich, Udo Hahn, and Matthieu-P. Schapranow. 2022. GGPONC 2.0 - The German Clinical Guideline Corpus for Oncology: Curation Workflow, Annotation Policy, Baseline NER Taggers. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3650–3660, Marseille, France. European Language Resources Association.“

You are currently not logged in. Do you have an account? Log in here

Additional details

Is original form of: Dataset: 10.5281/zenodo.12520368 (DOI)
Is published in: Conference paper: https://aclanthology.org/2022.lrec-1.389 (URL)
References: Software: 10.5281/zenodo.6473122 (DOI)

Available: 2023-03-24

Repository URL: https://github.com/hpi-dhc/ggponc_annotation

	All versions	This version
Views	296	296
Downloads	26	26
Data volume	7.1 GB	7.1 GB

GGPONC 2.0 (Major Release incl. Gold Standard Annotations)

About this Release

Project Description

Files

Restricted

Request access

Instructions

Terms of Use

Additional details

Related works

Dates

Software

GGPONC 2.0 (Major Release incl. Gold Standard Annotations)

Creators

Description

About this Release

Project Description

Files

Restricted

Request access

Instructions

Terms of Use

Additional details

Related works

Dates

Software