Published July 1, 2020 | Version v1
Other Open

Chilean Waiting List Corpus Embeddings

  • 1. Center for Mathematical Modeling, University of Chile

Contributors

  • 1. Centro de Modelamiento Matemático, Universidad de Chile
  • 2. Centro de Informática Médica y Telemedicina, Universidad de Chile

Description

 

The Chilean Waiting List Corpus Embeddings is a Word2Vec word embedding trained over 11 million unstructured free text diagnostics obtained from the Chilean Waiting List through Transparency Law. The corpus used to train this embedding was composed of 56 million word-tokens, where the vocabulary length was 252 thousand different words.

The original Mikolov's implementation of the Word2Vec algorithm was used to compute the embeddings with the default hyperparameters, except for the vector size which was changed to 300.

Files

Files (163.3 MB)

Name Size Download all
md5:f262e5e09313cc04f2115c759909662f
163.3 MB Download