Published September 1, 2020 | Version 1.0
Dataset Open

Cantemist Silver Standard: Participant predictions in SEPLN IberLEF2020 - Spanish oncology clinical cases coded in ICD-O

  • 1. Barcelona Supercomputing Center

Description

Introduction

Predictions in the background set of Cantemist participants.

 

Zip structure

One directory per Cantemist subtask. Within each Cantemist subtask directory, there is one directory per team that contains the prediction runs.

 

Format

The text documents are distributed in plain text files, UTF-8 encoding.
The CodiEsp Silver Standard annotations have the following format:

For the sub-tracks Cantemist-NER and Cantemist-Norm, the files are in Brat format.

For the sub-track Cantemist-Coding files have the following fields:

articleID  ICDO-code 

 

Resources:

  • Web
  • Citation: Miranda-Escalada, A., Farré, E., & Krallinger, M. (2020). Named entity recognition, concept normalization and clinical coding: Overview of the cantemist track for cancer text mining in spanish, corpus, guidelines, methods and results. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020), CEUR Workshop Proceedings.
  • Gold Standard corpus
  • Annotation guidelines
  • YouTube presentations
  • Participant codes

 

All credit to Cantemist participants. 

 

For more information, visit the track webpage: http://temu.bsc.es/cantemist/ or email us at encargo-pln-life@bsc.es

Notes

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).

Files

silver-standard.zip

Files (690.0 MB)

Name Size Download all
md5:81ad70fb1a45d7d5d465a1fd8e3ddce5
690.0 MB Preview Download