MEDDOCAN corpus: gold standard annotations for Medical Document Anonymization on Spanish clinical case reports
Creators
- 1. Barcelona Supercomputing Center
- 2. Centro Nacional de Investigaciones Oncológicas
- 3. Hospital 12 de Octubre
Description
Intro:
Meddocan shared task dataset (divided in train, dev and test). In addition, we include here the Meddocan background set.
It contains the training, development and test sets of the Meddocan shared task with Gold Standard annotations.
In addition, it contains the documents of the background set, without annotations.
Annotation quality
Inter-annotator agreement: 98%
For more information, see the paper.
Format:
Annotations are distributed in Brat format. See Brat webpage for more information.
In addition, annotations are also distributed in XML format (based on i2b2 XML format).
In the Meddocan webpage, there is a script to convert between MEDDOCAN-Brat, MEDDOCAN-XML, and i2b2 formats.
Shared task goal:
In the three subtasks, the goal will be to predict the annotations given only the plain text files.
Resources:
- Web
- Citation: Montserrat Marimon et al. “Automatic De-identification of Medical Texts in Spanish: the MEDDOCAN Track, Corpus, Guidelines, Methods and Evaluation of Results.” In: IberLEF@ SEPLN. 2019, pp. 618–638.
- Silver Standard corpus
- Annotation guidelines
For further information, please visit https://temu.bsc.es/meddocan/ or email us at encargo-pln-life@bsc.es
Copyright (c) 2019 Secretaría de Estado para el Avance Digital (SEAD)
Notes
Files
meddocan.zip
Files
(11.7 MB)
Name | Size | Download all |
---|---|---|
md5:6a09eb975580fdf56bc7041eadc9c921
|
11.7 MB | Preview Download |