Planned intervention: On Thursday March 28th 07:00 UTC Zenodo will be unavailable for up to 5 minutes to perform a database upgrade.

There is a newer version of the record available.

Published April 28, 2020 | Version 1.0
Dataset Open

Cantemist corpus: gold standard of oncology clinical cases annotated with CIE-O 3 terminology

  • 1. Barcelona Supercomputing Center

Description

Cantemist shared task sample set.

It contains the sample set of the three subtasks: cantemist-ner, cantemist-norm and cantemist-coding.

For subtasks cantemist-norm and cantemist-ner, annotations are distributed in Brat format. See Brat webpage for more information

For subtask cantemist-coding, codes are grouped in a TSV file with the following columns (this follows the format used in CodiEsp shared task ): 

filename    code

In the three subtasks, the goal will be to predict the annotations (either the ANN files or the TSV with the codes) given only the plain text files. 

For further information, please visit https://temu.bsc.es/cantemist/ or email us at encargo-pln-life@bsc.es

Notes

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).

Files

sample-set-to-publish.zip

Files (278.5 kB)

Name Size Download all
md5:a3afb4f7509d5a855b50f4fe7c65a87b
278.5 kB Preview Download