Dataset Open Access
PharmaCoNER shared task dataset (divided into train, dev and test). In addition, we include here the PharmaCoNER background set.
It contains the train, development and test sets of the two subtasks (subtask-1 and subtask-2) with Gold Standard annotations.
In addition, it contains the documents of the background set, without annotations.
A. G. Agirre, M. Marimon, A. Intxaurrondo, O. Rabal, M. Villegas, M. Krallinger, Pharmaconer: Pharmacological substances, compounds and proteins named entity recognition track, in: Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, 2019, pp. 1–10.
Inter-annotator agreement: 93% for annotation, 73% for mapping.
For more information, see the paper.
For subtask 1 annotations are distributed in Brat format. (More info at Brat webpage https://brat.nlplab.org/standoff.html)
For subtask-2, codes are associated with each document are given in a TSV file with the following columns:
Shared task goal:
In the two subtasks, the goal is to predict the annotations of the test files (either the ANN files or the TSV with the codes) given only the plain text files.
For further information, please visit https://temu.bsc.es/pharmaconer/ or email us at email@example.com