Dataset Open Access

FastText Spanish Medical Embeddings

Felipe Soares; Marta Villegas; Aitor Gonzalez-Agirre; Jordi Armengol-Estapé; Siamak Barzegar; Martin Krallinger

[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that include: (a) the full-text in Spanish available in (until December/2018), (b) all articles from the following Wikipedia categories: Pharmacology, Pharmacy, Medicine and Biology (during December/2018) and (c) the concatenation of the previous two corpora.

We used fastText to train the word embeddings.

For more information, we refer to the corresponding article:

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL) and the ICTUSnet project (
Files (41.1 GB)
Name Size
41.1 GB Download
  • Soares F, Villegas M, Gonzalez-Agirre A, Krallinger M, Armengol-Estapé J. Medical Word Embeddings for Spanish: Development and Evaluation. InProceedings of the 2nd Clinical Natural Language Processing Workshop 2019 Jun (pp. 124-133).

All versions This version
Views 3,2751,221
Downloads 17,28416,725
Data volume 694.2 TB686.7 TB
Unique views 2,6401,048
Unique downloads 5,0894,681


Cite as