FastText and Word2Vec Spanish Medical Embeddings

doi:10.5281/zenodo.3626806

Medical NLP (maintained by NLP4BIA unit at BSC)– language technology resources for clinical and biomedical documents in multiple languages

There is a newer version of the record available.

Published January 23, 2020 | Version 2020-01-01

Dataset Open

FastText and Word2Vec Spanish Medical Embeddings

1. BSC

[Plan TL/medicine/word embeddings] Word embeddings generated from Spanish corpora that include: (a) the full-text in Spanish available in Scielo.org (until December/2018), (b) all articles from the following Wikipedia categories: Pharmacology, Pharmacy, Medicine and Biology (during December/2018) and (c) the concatenation of the previous two corpora.

To generate the word embedding two different approaches were used: Word2Vec and fastText.

For more information, we refer to the corresponding article: https://www.aclweb.org/anthology/W19-1916/

Notes

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).

Files

Embeddings_2020-01-23.zip

Files (20.4 GB)

Name	Size	Download all
Embeddings_2020-01-23.zip md5:c9ae3ba85307f7a965006fd8dd5dea06	20.4 GB	Preview Download

Additional details

Soares F, Villegas M, Gonzalez-Agirre A, Krallinger M, Armengol-Estapé J. Medical Word Embeddings for Spanish: Development and Evaluation. InProceedings of the 2nd Clinical Natural Language Processing Workshop 2019 Jun (pp. 124-133).

Natural language processing: http://id.loc.gov/authorities/subjects/sh88002425

Views

Downloads

Show more details

	All versions	This version
Views	4,684	1,247
Downloads	5,217	206
Data volume	696.6 TB	5.4 TB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Languages

Spanish

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: January 26, 2020
Modified: November 5, 2022

FastText and Word2Vec Spanish Medical Embeddings

Notes

Files

Embeddings_2020-01-23.zip

Files (20.4 GB)

Additional details

References

Subjects

FastText and Word2Vec Spanish Medical Embeddings

Creators

Description

Notes

Files

Embeddings_2020-01-23.zip

Files (20.4 GB)

Additional details

References

Subjects