There is a newer version of this record available.

Dataset Open Access

FastText and Word2Vec Spanish Medical Embeddings

Felipe Soares; Marta Villegas; Aitor Gonzalez-Agirre; Jordi Armengol-Estapé; Martin Krallinger

This version throws an error while loading the file, There is a newer version of this record available.

[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that include: (a) the full-text in Spanish available in Scielo.org (until December/2018), (b) all articles from the following Wikipedia categories: Pharmacology, Pharmacy, Medicine and Biology (during December/2018) and (c) the concatenation of the previous two corpora.

To generate the word embedding two different approaches were used: Word2Vec and fastText.

For more information, we refer to the corresponding article: https://www.aclweb.org/anthology/W19-1916/

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).
Files (8.6 GB)
Name Size
Embeddings_2019-01-01.zip
md5:e7a3dce00bcc156e150d45ae85e02be9
8.6 GB Download
  • Soares F, Villegas M, Gonzalez-Agirre A, Krallinger M, Armengol-Estapé J. Medical Word Embeddings for Spanish: Development and Evaluation. InProceedings of the 2nd Clinical Natural Language Processing Workshop 2019 Jun (pp. 124-133).

1,907
16,904
views
downloads
All versions This version
Views 1,907977
Downloads 16,904279
Data volume 681.8 TB2.4 TB
Unique views 1,552868
Unique downloads 4,845207

Share

Cite as