Spanish word2vec embeddings trained on OpenSubtitles Part 1
Description
This dataset contains the subs2vec embeddings for Spanish, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles.
For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:
- Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
- Window size: varying context windows (e.g., 2, 5, 10, …)
- Each file corresponds to a unique configuration (dimension × window size).
Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).
If you use this dataset, please cite:
- Manuscript: https://doi.org/10.5281/zenodo.17243812
- Data: This Zenodo dataset (using the DOI provided here)
Files
Files
(48.8 GB)
| Name | Size | |
|---|---|---|
|
md5:8f3a0fb1d2b5ae93a83206ab6648fc84
|
859.5 MB | Download |
|
md5:47007f921c4140e77c84ebbfe1fd6512
|
857.1 MB | Download |
|
md5:f35b5929bfbb7a981cdc31ffcdb241f8
|
859.0 MB | Download |
|
md5:c9e53b0930f6bc73670d89be29ec607f
|
858.5 MB | Download |
|
md5:37afb9478affb351ec41cfe04f7a6e2e
|
859.6 MB | Download |
|
md5:aabc98ec64342843543eb5e070c6f1fb
|
858.9 MB | Download |
|
md5:73fbc70394bad498ec658d6250244b6f
|
859.0 MB | Download |
|
md5:3c9752559530d4c04dc9697b87644370
|
859.7 MB | Download |
|
md5:6f5ff98ffc7a37fa57428c3814390129
|
859.1 MB | Download |
|
md5:c199cf5a3345a44802f033e6974e3064
|
859.6 MB | Download |
|
md5:4ce18d373ad962edcba524d8c52948ee
|
859.0 MB | Download |
|
md5:eaea6519cdddea0fd127943912e85677
|
859.9 MB | Download |
|
md5:a8fadde3c6bf649cf55700c0714a97ad
|
1.7 GB | Download |
|
md5:3069bfcb58cc1a4683ebdfd24592443c
|
1.7 GB | Download |
|
md5:d256eab4bcdc493a8734ac09cc27b988
|
1.7 GB | Download |
|
md5:1bf12835ac70389cb8d087c21b16bc1b
|
1.7 GB | Download |
|
md5:e23b36d13a51e1d6db53b6521582c3d8
|
1.7 GB | Download |
|
md5:44b8a29ee69dc8892d802a60dd4d1aa9
|
1.7 GB | Download |
|
md5:4ccc89735df00f92f91b00c17fd9da34
|
1.7 GB | Download |
|
md5:39604d8759ac3acd6697c2efa0147c7c
|
1.7 GB | Download |
|
md5:f5c3d6843a45b0c78eb45c357f14f845
|
1.7 GB | Download |
|
md5:966646288e2b94bc8cca782394c2ae1e
|
1.7 GB | Download |
|
md5:18397d4c5871ee7b286c2500f5185548
|
1.7 GB | Download |
|
md5:27c56e9958e2b0ee5eddce269e30957a
|
1.7 GB | Download |
|
md5:2e690b57d9efead83ef84be19cd877cc
|
2.6 GB | Download |
|
md5:363340cd86ca926159ddc39ca62727df
|
2.5 GB | Download |
|
md5:ccce3220214b015870db71894485c7eb
|
2.6 GB | Download |
|
md5:1984f5d415b43af0621bb0dcd3c9dfd6
|
2.6 GB | Download |
|
md5:da1ffda12901f822345e1c32e32feed6
|
2.6 GB | Download |
|
md5:676f0ac06adff3f9988703bbf3fcc252
|
434.0 MB | Download |
|
md5:e9639c8960dd80d6d84aa0b81afefcae
|
434.4 MB | Download |
|
md5:758ad8daafba6755f1b735e7260d4836
|
433.9 MB | Download |
|
md5:528ebcb7deae04dc885181c9cdb3d83c
|
435.6 MB | Download |
|
md5:a466fa01968843a99414b37330b248af
|
434.8 MB | Download |
|
md5:6153e299d76ef29034837ed584ec4339
|
436.3 MB | Download |
|
md5:00e8bb3af1c6f39db1f7bcbad4488f3b
|
434.6 MB | Download |
|
md5:746d921ac40dbc0c4ccb0df9c8187bb7
|
436.2 MB | Download |
|
md5:7593f43286ea5f424fb9ff63b4ee3152
|
435.1 MB | Download |
|
md5:d41aae5a63a5269b6164dcac85081a9b
|
436.8 MB | Download |
|
md5:c1b0ec87427abd97c0e524592e026a2b
|
435.0 MB | Download |
|
md5:549751d3af3feecd87bbc66ea3ce1ff4
|
436.6 MB | Download |
Additional details
Related works
- Is supplement to
- Standard: 10.5281/zenodo.17243812 (DOI)
Software
- Repository URL
- https://github.com/SemanticPriming/word2manylanguages
- Programming language
- Python , R