Dataset Open Access

PharmaCoNER corpus: gold standard annotations of Pharmacological Substances, Compounds and proteins in Spanish clinical case reports

Gonzalez-Agirre, Aitor; Miranda-Escalada, Antonio; Rabal, Obdulia; Krallinger, Martin

Intro:

PharmaCoNER shared task dataset (divided into train, dev and test). In addition, we include here the PharmaCoNER background set.

It contains the train, development and test sets of the two subtasks (subtask-1 and subtask-2) with Gold Standard annotations.

In addition, it contains the documents of the background set, without annotations.

 

Please, cite: 

A. G. Agirre, M. Marimon, A. Intxaurrondo, O. Rabal, M. Villegas, M. Krallinger, Pharmaconer: Pharmacological substances, compounds and proteins named entity recognition track, in: Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, 2019, pp. 1–10.

 

Annotation quality

Inter-annotator agreement: 93% for annotation, 73% for mapping.

For more information, see the paper.

 

Format

For subtask 1 annotations are distributed in Brat format. (More info at Brat webpage https://brat.nlplab.org/standoff.html)

For subtask-2, codes are associated with each document are given in a TSV file with the following columns: 

filename    code

 

Shared task goal:

In the two subtasks, the goal is to predict the annotations of the test files (either the ANN files or the TSV with the codes) given only the plain text files. 

 

Resources:

 

For further information, please visit https://temu.bsc.es/pharmaconer/ or email us at encargo-pln-life@bsc.es

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).
Files (6.6 MB)
Name Size
pharmaconer.zip
md5:e511314c4468c2655ae3355ce590a2c0
6.6 MB Download
242
61
views
downloads
All versions This version
Views 242242
Downloads 6161
Data volume 400.3 MB400.3 MB
Unique views 207207
Unique downloads 5858

Share

Cite as