English word2vec embeddings trained on OpenSubtitles Part 8
Description
This dataset contains the subs2vec embeddings for English, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles.
For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:
-
Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
-
Window size: varying context windows (e.g., 2, 5, 10, …)
-
Each file corresponds to a unique configuration (dimension × window size).
Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).
If you use this dataset, please cite:
-
Manuscript: https://doi.org/10.5281/zenodo.17243812
-
Data: This Zenodo dataset (using the DOI provided here)
sha256sums:
- en_500_3_cbow_wxd.csv.bz2 19485a54b2249c0897da166814c90437ce4f49ef56e7b54c8ed1444161f29e3b
- en_500_3_sg_wxd.csv.bz2 c298bc9468a71ec3c6e99f3cafe670cce91fe3e85d968fa5a9cc44769d7c5795
- en_500_4_cbow_wxd.csv.bz2 ab46058fa7339f3305ee68b9fa10be2fb0469318d89fdb2e819c60d1681dcfd1
Files
README.md
Files
(46.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:2ac9871106fdabfeaf55d88edbdcc366
|
1.1 GB | Download |
|
md5:d4b97bdad1d8675997920f6b12083c92
|
1.1 GB | Download |
|
md5:dbc23fe48d82d11d9182ac5bb9647feb
|
1.1 GB | Download |
|
md5:d3bbdb0ec753407c9b99d8d1e1711c91
|
1.1 GB | Download |
|
md5:68b15d436d9497a81d3ad8107fd87d90
|
1.1 GB | Download |
|
md5:cf48f2161fd5f82fcba0df63accb08b5
|
1.1 GB | Download |
|
md5:128f389d91b9405cc61263abc0c1c8eb
|
1.1 GB | Download |
|
md5:2750a11fc14dd5116b773f4877a4470a
|
1.1 GB | Download |
|
md5:580420a41a6ff9b1325acee2bcacf88e
|
1.1 GB | Download |
|
md5:cffdc192b218b214643bc6bfa818b0c2
|
1.1 GB | Download |
|
md5:34bae60f6c8ef87b2c0bfbbda0566645
|
1.1 GB | Download |
|
md5:7ec0186c0668f49355f00776a19cb9bc
|
1.1 GB | Download |
|
md5:99cd45202f440b874b196659f6e5fb02
|
1.1 GB | Download |
|
md5:b797ef118f075401670db693bc0a279b
|
1.1 GB | Download |
|
md5:a7e751b27b8ec495a5ddeb7a3418eef5
|
507.7 MB | Download |
|
md5:fb69ffcf54650300ae6f4466fbc549f8
|
1.1 GB | Download |
|
md5:9184da31abbb1132877797cf124cb272
|
1.1 GB | Download |
|
md5:a4e742dc8c327bb199264cce59c25784
|
1.1 GB | Download |
|
md5:6a976faee00d0be6cd44d0a796382b45
|
1.1 GB | Download |
|
md5:fcfed114a7a9c444454fc50568dc7e72
|
1.1 GB | Download |
|
md5:9fb51521328d11d0d216ad9f61b0fe0e
|
1.1 GB | Download |
|
md5:35b5fdeca8f75cfe0a73804eb1fc072c
|
1.1 GB | Download |
|
md5:fd57e3dc1266e708ea8eeac374c1eb74
|
1.1 GB | Download |
|
md5:2a8ae21a59b1fc95b74542ecc7d06444
|
1.1 GB | Download |
|
md5:5899ca03ba64760828cb4e2260ecb8ba
|
1.1 GB | Download |
|
md5:ebf8737c3bd2e122fe49ee3df39e8b18
|
1.1 GB | Download |
|
md5:1d1d557e129452589f48c92048e14315
|
1.1 GB | Download |
|
md5:4954dc8e552cf4e6d9c86591e2bb215f
|
1.1 GB | Download |
|
md5:452726238668322898be4c552f26c8ec
|
1.1 GB | Download |
|
md5:13265982b71e0479de4e3d0cca89ee14
|
486.5 MB | Download |
|
md5:ea9043c6736552725146e4463e0273d9
|
1.1 GB | Download |
|
md5:dfc72faee7f316edd35953b36f4657bc
|
1.1 GB | Download |
|
md5:9382f3472a445a9f3fa42707c752cc0c
|
1.1 GB | Download |
|
md5:42e53a252d9ab5c3c60203554442ae96
|
1.1 GB | Download |
|
md5:0f4356311ca8c89947f0a173c6a9289c
|
1.1 GB | Download |
|
md5:a0a0beb32e69961a3726ed53af2231f1
|
1.1 GB | Download |
|
md5:6e43e7424d24943a6ca8b4be2012557b
|
1.1 GB | Download |
|
md5:a2caa3793ce1c0b4760ddcd46875a37d
|
1.1 GB | Download |
|
md5:0d81fe276fa14d807faf456ddd060681
|
1.1 GB | Download |
|
md5:bedac9aa1caa4dff9675710991e78f06
|
1.1 GB | Download |
|
md5:9e5209834068375a2bc0aafd5c88f750
|
1.1 GB | Download |
|
md5:69e451ed1917d1bf4fee0c5c85fbdb93
|
1.1 GB | Download |
|
md5:23a0e5792d229a744d72c03d20c26e29
|
1.1 GB | Download |
|
md5:82a59fd68393717d1e032a75ce00b4b9
|
1.1 GB | Download |
|
md5:054bb724edcb7ca34cad4404cd0f6814
|
490.5 MB | Download |
|
md5:826f5465e694cf140b7a48209d422620
|
7.1 kB | Download |
|
md5:8864201e5e8f85f9bb348ad1be636f17
|
2.7 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Publication: 10.5281/zenodo.17243812 (DOI)
Software
- Repository URL
- https://github.com/SemanticPriming/word2manylanguages
- Programming language
- Python , R