Esperanto word2vec embeddings trained on OpenSubtitles
Description
This dataset contains the subs2vec embeddings for Esperanto, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles.
For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:
- Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
- Window size: varying context windows (e.g., 2, 5, 10, …)
- Each file corresponds to a unique configuration (dimension × window size).
Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).
If you use this dataset, please cite:
- Manuscript: https://doi.org/10.5281/zenodo.17243812
- Data: This Zenodo dataset (using the DOI provided here)
Files
Files
(19.7 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:f7c4ef0b03d103594e780eebf6e9f1d2
|
143.5 MB | Download |
|
md5:838c23295e88991b6883a2ae6ea94f42
|
143.4 MB | Download |
|
md5:b209b8c724055df7f9f86b0cd3619a0c
|
143.4 MB | Download |
|
md5:3f7073f84160b7f476bb23870b87873e
|
143.4 MB | Download |
|
md5:bc2a8899be6efd98307703b01e1d89cd
|
143.3 MB | Download |
|
md5:27b4bd1b8257e0e545c1018604324efb
|
143.5 MB | Download |
|
md5:14766a003889cca600a79b7a7808b673
|
143.3 MB | Download |
|
md5:40ee4a2f66d13fc26ad8273adeac9678
|
143.6 MB | Download |
|
md5:f43e797a49a2f44e57c079a3fa353e75
|
143.4 MB | Download |
|
md5:a6de6460c152b8543a8b4c7f9691758a
|
143.5 MB | Download |
|
md5:5a32b7be18f53ed86f71f0297a5eb04a
|
143.3 MB | Download |
|
md5:931138e426a9e64c66d8ad83b899f422
|
143.6 MB | Download |
|
md5:757bae3fdfce644836008cee8170c474
|
285.5 MB | Download |
|
md5:df4ce3b41bfb1751cec588730327b736
|
285.9 MB | Download |
|
md5:63ba5134544cc90626c5a1b8330322da
|
285.3 MB | Download |
|
md5:51d4a6b79739e759b442d5bd98d43596
|
285.9 MB | Download |
|
md5:65e709bf7add19eec2fe699608c10fd7
|
285.2 MB | Download |
|
md5:a6562a63a6c7192706786e12eab7d513
|
286.0 MB | Download |
|
md5:6e317ccb113ad2e905967288cd928021
|
285.1 MB | Download |
|
md5:f0444049a11128bcdacb5dc6989d198a
|
286.0 MB | Download |
|
md5:38cf39d04a18564789a86a3949d15942
|
285.0 MB | Download |
|
md5:e6b83c29f941293d975d9ee6ca2f642b
|
286.0 MB | Download |
|
md5:7461e27b197d644a56def10917f4ca26
|
285.0 MB | Download |
|
md5:46d8fe53dd570d9bd12683275e0189af
|
286.0 MB | Download |
|
md5:f148cb437028d8be0f6a525781333e60
|
427.7 MB | Download |
|
md5:688c782dde68b18eb096140bd8fc9b00
|
428.7 MB | Download |
|
md5:17d7b4a72299fc2d25926342606b16da
|
427.4 MB | Download |
|
md5:fb847d3a36d27c4554b47fd600b7e9c6
|
428.7 MB | Download |
|
md5:a38f6e85b4580aeb1a1c6dda49cc3de5
|
427.1 MB | Download |
|
md5:3c860289de30c2e47ff4e807dafb0c61
|
428.7 MB | Download |
|
md5:dc04cf59c25f7e33f587bd25f97d4c6d
|
427.1 MB | Download |
|
md5:1d76717ece71ca99bf83daf0d0887bfe
|
428.9 MB | Download |
|
md5:f52676096ce8c7b7284aac65f3e5a53d
|
426.9 MB | Download |
|
md5:4e2ec02b564a3aaa4a0e8e9fcdbbd793
|
428.9 MB | Download |
|
md5:04285985b040a9a56d3db196fb18b04f
|
426.9 MB | Download |
|
md5:ea13b6889b8442a09127d4b4ff9c1c12
|
428.8 MB | Download |
|
md5:e643e910e68c21a1d8d25fecc7fb3b2c
|
713.6 MB | Download |
|
md5:d9ad55d0a82e150b235266f7836dda2b
|
715.2 MB | Download |
|
md5:d7f721148fd0df5649ed403bcc1a75c0
|
712.3 MB | Download |
|
md5:fa65105a073b2fd01fb35cd47813638c
|
715.8 MB | Download |
|
md5:8a9943cdaaa99d2308cbae954690c41c
|
711.7 MB | Download |
|
md5:91f597340cede69c907322731f7f5bc8
|
715.3 MB | Download |
|
md5:3c5648b1e1038163cce20b8612d8d250
|
711.4 MB | Download |
|
md5:87265754488a707661c9617ece7001a2
|
715.2 MB | Download |
|
md5:fc4658b1a3d556312e9af5c0aa6fdce8
|
711.1 MB | Download |
|
md5:e7076b19f1871bcdecfffa17b0d9c4f3
|
715.4 MB | Download |
|
md5:61b6e10db853283681916da35414abc8
|
711.0 MB | Download |
|
md5:fc3aa1d9382ba1293ee8fb81a27feaf6
|
715.1 MB | Download |
|
md5:ab00b892cb6ee521dec3d35b0768065b
|
72.6 MB | Download |
|
md5:9e8047f7a91a484d1230f07f40ff90e4
|
72.7 MB | Download |
|
md5:7c4c7a481907d21d6be752886f29ceb8
|
72.6 MB | Download |
|
md5:1f7b1df9a2270ebf19792e069d44e03e
|
72.6 MB | Download |
|
md5:f381420f447cfb93a50565c05ca12a2e
|
72.7 MB | Download |
|
md5:a8c0911099cfac8c80fb6abcfa9960ea
|
72.7 MB | Download |
|
md5:dd0952ae5e848328e387d260f42405b7
|
72.6 MB | Download |
|
md5:8af3b3ee467e39c1d5f3b1c49e55d808
|
72.6 MB | Download |
|
md5:41fe39555b67dd1043141f07d37be2df
|
72.5 MB | Download |
|
md5:54ca3a4d7eadcb767d9cce2bbb12470c
|
72.6 MB | Download |
|
md5:cba5a12ab4030ce00d01afbdadd5eb33
|
72.5 MB | Download |
|
md5:901ff0577b880ab70ccb423387cacd7f
|
72.7 MB | Download |
Additional details
Related works
- Is supplement to
- Standard: 10.5281/zenodo.17243812 (DOI)
Software
- Repository URL
- https://github.com/SemanticPriming/word2manylanguages
- Programming language
- Python , R