English word2vec embeddings trained on OpenSubtitles Part 3
Description
This dataset contains the subs2vec embeddings for [Language Name], as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles.
For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:
-
Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
-
Window size: varying context windows (e.g., 2, 5, 10, …)
-
Each file corresponds to a unique configuration (dimension × window size).
Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).
If you use this dataset, please cite:
-
Manuscript: https://doi.org/10.5281/zenodo.17243812
-
Data: This Zenodo dataset (using the DOI provided here)
sha256-hashes:
- en_200_3_sg_wxd.csv.bz2 d87de6051c5005ae1c0b2cef9892c43a4186c65e9b936a76b7aef09be3f05d6b
- en_200_4_cbow_wxd.csv.bz2 d5b4af384791fae7d240977dffcc21f21648849fac4e8e3d70de4d00a398f732
- en_200_4_sg_wxd.csv.bz2 7b05d89bc357763b19a8bda5d6f905e02cfd5974c4c3879761635ed2a528ad21
- en_200_5_cbow_wxd.csv.bz2 e39f5e3e119e892a2b0ac6c715b72679ece2f1dcacf370ea6de25baf6d4a37ae
- en_200_5_sg_wxd.csv.bz2 8cf198b75697cbfab10d3614b894e0ae52c220759ffcff0ec4ed320b739dac66
- en_200_6_cbow_wxd.csv.bz2 c08e7bb1ddd2faed730f801bc6ed297683ec66404b02109da8eab3019079a1ef
- en_200_6_sg_wxd.csv.bz2 0fabc3289a5759f0513907400bd05751bcaeab92338ddd66fcc4829b1e6f0172
Files
README.md
Files
(43.7 GB)
| Name | Size | |
|---|---|---|
|
md5:75d8941d5f904cac58c1d74574e296d4
|
1.1 GB | Download |
|
md5:8934b81b27fddf866807829529e492bc
|
1.1 GB | Download |
|
md5:56e08f8e6e264db27a67fc0501ded0ad
|
1.1 GB | Download |
|
md5:6b67cd611161e68aa9bcd7889c07ef3a
|
1.1 GB | Download |
|
md5:a2223977a51671507a6fce8283a3d637
|
1.1 GB | Download |
|
md5:cb05b862afc3ffe3337bf9827608c967
|
875.8 MB | Download |
|
md5:aad055e5baece47709cd394346c6427b
|
1.1 GB | Download |
|
md5:d30cbd256d1c4359063043d59ae5f023
|
1.1 GB | Download |
|
md5:8c083e94afca66f11ef454398f2cfa06
|
1.1 GB | Download |
|
md5:ca9d2b08929b92bd0cf5cb8f42ed0bf3
|
1.1 GB | Download |
|
md5:9e944bbe43dbd4f1c7df6682f9cfdab4
|
1.1 GB | Download |
|
md5:1b774863a3263460c993038ce7cca7ea
|
866.3 MB | Download |
|
md5:297ba9e65c7ee579959882675121804c
|
1.1 GB | Download |
|
md5:31374a620980a0babd2d164bb58eba0b
|
1.1 GB | Download |
|
md5:4c9711df73d73d95599cfaef2c86eae5
|
1.1 GB | Download |
|
md5:d886b72dce2de02aa5ff16f8f88de08f
|
1.1 GB | Download |
|
md5:c3298ccb7e37499f6d1031f3e0f2f522
|
1.1 GB | Download |
|
md5:6ce41657207eec67e2a98850b6f8421a
|
864.0 MB | Download |
|
md5:f0dd52eb15c96ed6d97b819f075626a6
|
1.1 GB | Download |
|
md5:f318b80e328a72c3012817d9f576fbfd
|
1.1 GB | Download |
|
md5:1ccb949156fd46411e5da2c6f4d1f45b
|
1.1 GB | Download |
|
md5:c74c7c1aefa9a577812f15545a34d03e
|
1.1 GB | Download |
|
md5:051323b990bb8424b8e321c5ed6432a7
|
1.1 GB | Download |
|
md5:c54cd421601d76f8d484c72aa8a23b1a
|
880.1 MB | Download |
|
md5:93e431f44ec19973bb415a65e3377054
|
1.1 GB | Download |
|
md5:482e752e2ebf41d8c02e9f9daa44f74b
|
1.1 GB | Download |
|
md5:bc53981862724acba56b5cd1fb70f5cd
|
1.1 GB | Download |
|
md5:568780d8f637ba5dc3fc95a1c139fc06
|
1.1 GB | Download |
|
md5:4efe59260aac111eff145216d953c139
|
1.1 GB | Download |
|
md5:5dae183fc733e9d988ea9a18c0ff01d5
|
856.7 MB | Download |
|
md5:54552f8b3f028283cdceac14f135b51b
|
1.1 GB | Download |
|
md5:9640cea51b42d0ba6f8186758f9ca838
|
1.1 GB | Download |
|
md5:fc57781748367d701db900d74edf710d
|
1.1 GB | Download |
|
md5:cc48247a0494ee88b1c3832f6cbbd204
|
1.1 GB | Download |
|
md5:6f2de822aadd656d368bb64f7d17517c
|
1.1 GB | Download |
|
md5:0529ef69ca90cc40fbe4ce93cf17119e
|
862.7 MB | Download |
|
md5:c97bd578a731a3e03c2eb4e4fac55b9b
|
1.1 GB | Download |
|
md5:4a5f8695187eb4407a1d90681e4f3bae
|
1.1 GB | Download |
|
md5:abdd6b09860119d3261ef32566442e8f
|
1.1 GB | Download |
|
md5:b481ef2966ce9d1edfa045f6b00e496c
|
1.1 GB | Download |
|
md5:999a4d16cac2d56748ab7469a9593cc2
|
1.1 GB | Download |
|
md5:bdbdd5ba169fff8bdfa1688c1974db97
|
866.6 MB | Download |
|
md5:826f5465e694cf140b7a48209d422620
|
7.1 kB | Download |
|
md5:8864201e5e8f85f9bb348ad1be636f17
|
2.7 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Publication: 10.5281/zenodo.17243812 (DOI)
Software
- Repository URL
- https://github.com/SemanticPriming/word2manylanguages
- Programming language
- Python , R