Published January 23, 2020
| Version 2020-01-01
Dataset
Open
FastText and Word2Vec Spanish Medical Embeddings
Description
[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that include: (a) the full-text in Spanish available in Scielo.org (until December/2018), (b) all articles from the following Wikipedia categories: Pharmacology, Pharmacy, Medicine and Biology (during December/2018) and (c) the concatenation of the previous two corpora.
To generate the word embedding two different approaches were used: Word2Vec and fastText.
For more information, we refer to the corresponding article: https://www.aclweb.org/anthology/W19-1916/
Notes
Files
Embeddings_2020-01-23.zip
Files
(20.4 GB)
Name | Size | Download all |
---|---|---|
md5:c9ae3ba85307f7a965006fd8dd5dea06
|
20.4 GB | Preview Download |
Additional details
References
- Soares F, Villegas M, Gonzalez-Agirre A, Krallinger M, Armengol-Estapé J. Medical Word Embeddings for Spanish: Development and Evaluation. InProceedings of the 2nd Clinical Natural Language Processing Workshop 2019 Jun (pp. 124-133).
Subjects
- Natural language processing
- http://id.loc.gov/authorities/subjects/sh88002425