Spanish 3B words Word2Vec Embeddings
Description
Ready to use gensim Word2Vec embedding models for the Spanish language. Models are created using a window of +/- 5 words, discarding those words with less than 5 instances and creating a vector of 400 dimensions for each word. The text used to create the embeddings has been recovered from news, Wikipedia, the Spanish BOE, web crawling and open literary sources. The used text has a total of 3.257.329.900 words and 18.852.481.207 characters.
We support two types of models: Gensim full models (complete_model.zip) and KeyedVectors (keyed_vectors.zip). You can check the differences between them in the following URL: https://radimrehurek.com/gensim/models/keyedvectors.html
To load the full model use: model = Word2Vec.load("complete.model")
To load the KeyedVectors use: word_vectors = KeyedVectors.load('complete.kv', mmap='r')
More info about the models can be found in: https://github.com/aitoralmeida/spanish_word2vec
Files
complete_model.zip
Files
(11.4 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:d8f7542f0f22dc248538e7a0a45d8141
|
8.5 GB | Preview Download |
|
md5:e336f4423e3e85658d69bf0984d8e361
|
2.9 GB | Preview Download |