Dataset Open Access

PharmaCoNER corpus: gold standard annotations of Pharmacological Substances, Compounds and proteins in Spanish clinical case reports

Gonzalez-Agirre, Aitor; Miranda-Escalada, Antonio; Rabal, Obdulia; Krallinger, Martin


PharmaCoNER shared task dataset (divided into train, dev and test). In addition, we include here the PharmaCoNER background set.

It contains the train, development and test sets of the two subtasks (subtask-1 and subtask-2) with Gold Standard annotations.

In addition, it contains the documents of the background set, without annotations.


Please, cite: 

A. G. Agirre, M. Marimon, A. Intxaurrondo, O. Rabal, M. Villegas, M. Krallinger, Pharmaconer: Pharmacological substances, compounds and proteins named entity recognition track, in: Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, 2019, pp. 1–10.


Annotation quality

Inter-annotator agreement: 93% for annotation, 73% for mapping.

For more information, see the paper.



For subtask 1 annotations are distributed in Brat format. (More info at Brat webpage

For subtask-2, codes are associated with each document are given in a TSV file with the following columns: 

filename    code


Shared task goal:

In the two subtasks, the goal is to predict the annotations of the test files (either the ANN files or the TSV with the codes) given only the plain text files. 




For further information, please visit or email us at

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).
Files (6.6 MB)
Name Size
6.6 MB Download
All versions This version
Views 242242
Downloads 6161
Data volume 400.3 MB400.3 MB
Unique views 207207
Unique downloads 5858


Cite as