There is a newer version of this record available.

Dataset Open Access

Cantemist corpus: gold standard of oncology clinical cases annotated with CIE-O 3 terminology

Antonio Miranda-Escalada; Farré, Eulàlia; Martin Krallinger

Cantemist shared task train and development sets.

It contains the train and development sets of the three subtasks: cantemist-ner, cantemist-norm and cantemist-coding.

For subtasks cantemist-norm and cantemist-ner, annotations are distributed in Brat format. See Brat webpage for more information

For subtask cantemist-coding, codes are grouped in a TSV file with the following columns (this follows the format used in CodiEsp shared task): 

filename    code

In the three subtasks, the goal will be to predict the annotations (either the ANN files or the TSV with the codes) given only the plain text files. 

For further information, please visit or email us at

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).
Files (6.9 MB)
Name Size
6.9 MB Download
All versions This version
Views 2,695805
Downloads 564139
Data volume 7.5 GB955.9 MB
Unique views 2,082665
Unique downloads 489122


Cite as