Spanish 3B words Word2Vec Embeddings

Aitor Almeida; Aritz Bilbao

doi:10.5281/zenodo.1410403

Published September 6, 2018 | Version 1.0

Dataset Open

Spanish 3B words Word2Vec Embeddings

1. DeustotTech

Contributors

Contact persons:

1. DeustotTech

Ready to use gensim Word2Vec embedding models for the Spanish language. Models are created using a window of +/- 5 words, discarding those words with less than 5 instances and creating a vector of 400 dimensions for each word. The text used to create the embeddings has been recovered from news, Wikipedia, the Spanish BOE, web crawling and open literary sources. The used text has a total of 3.257.329.900 words and 18.852.481.207 characters.

We support two types of models: Gensim full models (complete_model.zip) and KeyedVectors (keyed_vectors.zip). You can check the differences between them in the following URL: https://radimrehurek.com/gensim/models/keyedvectors.html

To load the full model use: model = Word2Vec.load("complete.model")
To load the KeyedVectors use: word_vectors = KeyedVectors.load('complete.kv', mmap='r')

More info about the models can be found in: https://github.com/aitoralmeida/spanish_word2vec

Files

complete_model.zip

Files (11.4 GB)

Name	Size	Download all
complete_model.zip md5:d8f7542f0f22dc248538e7a0a45d8141	8.5 GB	Preview Download
keyed_vectors.zip md5:e336f4423e3e85658d69bf0984d8e361	2.9 GB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	6,527	4,964
Downloads	4,525	4,009
Data volume	35.6 TB	32.2 TB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Languages

Spanish

License: Creative Commons Attribution Share Alike 4.0 International

Permits almost any use subject to providing credit and license notice. Frequently used for media assets and educational materials. The most common license for Open Access scientific publications. Not recommended for software. Read more

Technical metadata

Created: September 7, 2018
Modified: January 24, 2020

Spanish 3B words Word2Vec Embeddings

Creators

Contributors

Contact persons:

Description

Files

complete_model.zip

Files (11.4 GB)