PharmaCoNER shared task dataset (divided in train, dev and test). In addition, we include here the PharmaCoNER background set.

It contains the train, development and test sets of the two subtasks (subtask-1 and subtask-2) with Gold Standard annotations.

In addition, it contains the documents background sets, without annotations.

For subtask 1 annotations are distributed in Brat format. See Brat webpage for more information https://brat.nlplab.org/standoff.html

For subtask-2, codes are associated with each document are given in a TSV file with the following columns: 
filename	code

In the two subtasks, the goal is to predict the annotations of the test files (either the ANN files or the TSV with the codes) given only the plain text files. 

For further information, please visit https://temu.bsc.es/pharmaconer/ or email us at encargo-pln-life@bsc.es


Please, cite A. G. Agirre, M. Marimon, A. Intxaurrondo, O. Rabal, M. Villegas, M. Krallinger, Pharmaconer: Pharmacological substances, compounds and proteins named entity recognition track, in: Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, 2019, pp. 1–10.
