FastText and Word2Vec Spanish Medical Embeddings

Felipe Soares; Marta Villegas; Aitor Gonzalez-Agirre; Jordi Armengol-Estapé; Martin Krallinger

doi:10.5281/zenodo.2542722

There is a newer version of the record available.

Published January 17, 2019 | Version 2019-01-01

Dataset Open

FastText and Word2Vec Spanish Medical Embeddings

1. BSC

This version throws an error while loading the file, There is a newer version of this record available.

[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that include: (a) the full-text in Spanish available in Scielo.org (until December/2018), (b) all articles from the following Wikipedia categories: Pharmacology, Pharmacy, Medicine and Biology (during December/2018) and (c) the concatenation of the previous two corpora.

To generate the word embedding two different approaches were used: Word2Vec and fastText.

For more information, we refer to the corresponding article: https://www.aclweb.org/anthology/W19-1916/

Notes

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).

Files

Embeddings_2019-01-01.zip

Files (8.6 GB)

Name	Size	Download all
Embeddings_2019-01-01.zip md5:e7a3dce00bcc156e150d45ae85e02be9	8.6 GB	Preview Download

Additional details

Soares F, Villegas M, Gonzalez-Agirre A, Krallinger M, Armengol-Estapé J. Medical Word Embeddings for Spanish: Development and Evaluation. InProceedings of the 2nd Clinical Natural Language Processing Workshop 2019 Jun (pp. 124-133).

Natural language processing: http://id.loc.gov/authorities/subjects/sh88002425

Views

Downloads

Show more details

	All versions	This version
Views	5,525	1,848
Downloads	5,440	360
Data volume	702.7 TB	3.8 TB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Languages

Spanish

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: January 17, 2019
Modified: November 5, 2022

Embeddings_2019-01-01.zip

Files (8.6 GB)

References

Subjects

FastText and Word2Vec Spanish Medical Embeddings

Authors/Creators

Description

Notes

Files

Embeddings_2019-01-01.zip

Files (8.6 GB)

Additional details

References

Subjects