Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

There is a newer version of the record available.

Published November 23, 2020 | Version v2
Dataset Open

The Chilean Waiting List Corpus

  • 1. Center for Medical Informatics and Telemedicine, University of Chile
  • 2. Center for Mathematical Modeling & Center for Medical Informatics and Telemedicine, University of Chile
  • 3. Department of Computer Science, University of Chile

Description

In this work we describe the Waiting List Corpus consisting of de-identified referrals for several specialty consultations from the waiting list in Chilean public hospitals. A subset of 3000 referrals was manually annotated with 27892 entities, 1272 attributes, and 762 pairs of relations with clinical relevance. 
The corpus is 68 % medical and 32 % dental. A trained medical doctor or dentist annotated these referrals, and then together with other three researchers, consolidated each of the annotations. The annotated corpus has nested entities, with  35 % of entities embedded in other entities. We use this annotated corpus to obtain preliminary results for Named Entity Recognition (NER). The best results were achieved by using a biLSTM-CRF architecture using word embeddings trained over  Spanish Wikipedia together with clinical embeddings computed by the group. NER models applied to this corpus can leverage statistics of diseases and pending procedures within this waiting list. This work constitutes the first annotated corpus using clinical narratives from Chile, and one of the few for the Spanish language. The annotated corpus, the clinical word embeddings, and the annotation guidelines are freely released to the research community.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Files

cwlc.zip

Files (2.3 MB)

Name Size Download all
md5:abf08300fbff90e7adddeaf19304e64b
2.3 MB Preview Download