Tegulu word2vec embeddings trained on OpenSubtitles
Description
This dataset contains the subs2vec embeddings for Tegulu, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles.
For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:
- Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
- Window size: varying context windows (e.g., 2, 5, 10, …)
- Each file corresponds to a unique configuration (dimension × window size).
Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).
If you use this dataset, please cite:
- Manuscript: https://doi.org/10.5281/zenodo.17243812
- Data: This Zenodo dataset (using the DOI provided here)
Files
Files
(9.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:742635911911cc945308aa403f64f992
|
65.4 MB | Download |
|
md5:834d5727a83b0323ee4f3b41434cbe7a
|
65.4 MB | Download |
|
md5:ae82a8ef3b670dd4963586eeafce1e00
|
65.4 MB | Download |
|
md5:156a822ab4378d3827a64027741fa141
|
65.5 MB | Download |
|
md5:98aeca970a7c3442148c2e78d01ffb22
|
65.4 MB | Download |
|
md5:bc0863506e9cb6f5702bbb315dfbb968
|
65.5 MB | Download |
|
md5:6bee04356f32ccefc49a90298c15748b
|
65.4 MB | Download |
|
md5:b24969659996d3ef13e4b4fbac6874f8
|
65.5 MB | Download |
|
md5:42c066b6d84b020f74281eb34d0416ec
|
65.3 MB | Download |
|
md5:7aa68377d1030b1ed12032671d0b95f1
|
65.4 MB | Download |
|
md5:4f7598e63690521399d964ac1abb44df
|
65.3 MB | Download |
|
md5:e53a98466e718710f35340c258132405
|
65.4 MB | Download |
|
md5:9baf0c67a2d2a80bdfb4c45e2e893467
|
130.2 MB | Download |
|
md5:b7bd658405169e8078f44a60ad41f37d
|
130.4 MB | Download |
|
md5:688f1f57988d0fa5e8083dbd81de7ddc
|
130.1 MB | Download |
|
md5:57d59ef46fefb81985f4902347750ab8
|
130.4 MB | Download |
|
md5:4c386c81e5b6d69186bf6fcddf6d24e5
|
130.0 MB | Download |
|
md5:778e8d352de6e56f8a1da76cfe58a00c
|
130.4 MB | Download |
|
md5:a1b04a66c13a7b8917ce77b213751afe
|
130.0 MB | Download |
|
md5:b3e9b38b4eb8e1c36b8fcd0bcfd2cf2a
|
130.4 MB | Download |
|
md5:b41d639d13bef9cd9a93b10b512c2e60
|
130.0 MB | Download |
|
md5:dbfce4aa1efd1fd573631735a93ba31a
|
130.4 MB | Download |
|
md5:0b63c8be7ce19ac2ca7de8e0fd920d4b
|
130.0 MB | Download |
|
md5:1e5bd6085ab5a526a13e606d58ecab8c
|
130.4 MB | Download |
|
md5:dac3537530d8b286bff468dd3f7e6ee4
|
195.2 MB | Download |
|
md5:add853d5556e74d89cf720eafa43b828
|
195.7 MB | Download |
|
md5:df536844c34c5db23865a1e197262acc
|
195.1 MB | Download |
|
md5:91339bfb54e4fee5568ad573da456d3b
|
195.8 MB | Download |
|
md5:927964341f6aca05281b1855de5ac405
|
195.0 MB | Download |
|
md5:cf37c2153652e1d266129f2f6f486e6f
|
195.7 MB | Download |
|
md5:030ddd2446c50e9ea4a2969df154d766
|
194.9 MB | Download |
|
md5:e799dda72233606af77e52762c478ecb
|
195.7 MB | Download |
|
md5:d2537547f4773737f80af35058ecca05
|
194.9 MB | Download |
|
md5:4da7d7a982e9a37055df1549c1939a9e
|
195.7 MB | Download |
|
md5:327f8de3e8810a66d430d4135a8e10da
|
194.8 MB | Download |
|
md5:4f3570ce5f134c49e8e357acb2ad4497
|
195.6 MB | Download |
|
md5:6ff3dcba67b9f87b16e806116a6596d0
|
326.0 MB | Download |
|
md5:5198b0e213c34065a59f14238dd8a2bc
|
326.9 MB | Download |
|
md5:0922c24d464d632e034c823619a50ad0
|
325.7 MB | Download |
|
md5:ed250314a5715af0f5915f004124b4e1
|
326.9 MB | Download |
|
md5:067abc4df87992945f500b72fb94226b
|
325.4 MB | Download |
|
md5:6bcf4c6b69eb723ff1dc46c4f841755d
|
326.9 MB | Download |
|
md5:be1fa3dfb51c3b27a8659dc204de2f5b
|
325.3 MB | Download |
|
md5:ad9fcf293d02d7fc244040521408e23a
|
326.9 MB | Download |
|
md5:8fcd9193018432595d39e7a4e7d4ac4f
|
325.0 MB | Download |
|
md5:246cfc0f6667ec424ede09c829aadeb3
|
326.8 MB | Download |
|
md5:c266942f1dbdbf1ccd41ea58bbb20743
|
325.0 MB | Download |
|
md5:8287ec3b821808acb1d75ea362e92f7e
|
326.6 MB | Download |
|
md5:ff9607ed5e5ce79e2f5868e1c9dec17e
|
33.0 MB | Download |
|
md5:62a2aa7bb0d4b5e467e8d580f0c29192
|
33.0 MB | Download |
|
md5:7332c9b65b303ee71b265c67d5d9d29d
|
33.0 MB | Download |
|
md5:dac2444b808b49a3cc405805e3f0e2a8
|
33.0 MB | Download |
|
md5:bd418f07e13a4eac2cc16d1ccab8c6e4
|
33.0 MB | Download |
|
md5:af605132f37b8bc67feed7d3f77ade04
|
33.0 MB | Download |
|
md5:81933d9a4a4ced5c3511fbb14235dfde
|
33.0 MB | Download |
|
md5:423e3c371f611c6eccd17aa460dfb4ca
|
33.1 MB | Download |
|
md5:769eed4d34e7e9806834860c5252d4c5
|
33.0 MB | Download |
|
md5:e4d9b448d4e0f423245935a0120baf98
|
33.1 MB | Download |
|
md5:11d9427a405c225be627c6d5a10e9978
|
33.0 MB | Download |
|
md5:11dd4b3d68cf39f563af9e491c87719c
|
33.0 MB | Download |
Additional details
Related works
- Is supplement to
- Standard: 10.5281/zenodo.17243812 (DOI)
Software
- Repository URL
- https://github.com/SemanticPriming/word2manylanguages
- Programming language
- Python , R