English word2vec embeddings trained on OpenSubtitles Part 6
Description
This dataset contains the subs2vec embeddings for English, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles.
For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:
-
Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
-
Window size: varying context windows (e.g., 2, 5, 10, …)
-
Each file corresponds to a unique configuration (dimension × window size).
Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).
If you use this dataset, please cite:
-
Manuscript: https://doi.org/10.5281/zenodo.17243812
-
Data: This Zenodo dataset (using the DOI provided here)
sha256sums:
- en_300_6_cbow_wxd.csv.bz2 72a94830d81ebbe28e7fa78465e02ad2bd7771ef5414f8f30f6de94565050167
- en_300_6_sg_wxd.csv.bz2 b0d7db822f181a124758e55b0c33b47e0a249c37f8a778806f6166a5baf96cb3
- en_500_1_cbow_wxd.csv.bz2 4196d84670045dc3cb65195f4045e543c78c1c35d28530b6c24e7711b8cbf23b
Files
README.md
Files
(34.2 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:0f8b4d5cb647029bb63b74a63eb9f9e7
|
1.1 GB | Download |
|
md5:6d32afc9d8e06621e618fd7a1cd959b8
|
1.1 GB | Download |
|
md5:54786e7ef540247c5894a4f7df18ce7b
|
1.1 GB | Download |
|
md5:9af908c91bfaeb85c72c740e8ff37aed
|
1.1 GB | Download |
|
md5:1c34014d61e2fad1cee329870280732c
|
1.1 GB | Download |
|
md5:43f4c6e65af39662a5c35ba91df1afca
|
1.1 GB | Download |
|
md5:ae155f92c575eb1d9b851fd2c1c9ae59
|
1.1 GB | Download |
|
md5:1d17e86a324a3f405ada39b0bb325268
|
1.1 GB | Download |
|
md5:8db28826d6b0fb4760b1be58e298e04f
|
767.0 MB | Download |
|
md5:b1e14f00e8a4094d14af3a3567573d86
|
1.1 GB | Download |
|
md5:eb9381b14109bf5eb1a46975a86405e2
|
1.1 GB | Download |
|
md5:d7ea5450d8c820bca194a6709be0426d
|
1.1 GB | Download |
|
md5:0702f20ee8e8bcd0fa46fc451ad931cb
|
1.1 GB | Download |
|
md5:af3146314b32780f47659df8d3e75caf
|
1.1 GB | Download |
|
md5:4271252da0e1a79431230a93c8e74516
|
1.1 GB | Download |
|
md5:68ed622610da52e8e9ca39eed5dd313d
|
1.1 GB | Download |
|
md5:65f79415cfc6d0f7a070befc9ec56467
|
1.1 GB | Download |
|
md5:c5836c20c9f904192b22821a5a2fe760
|
750.0 MB | Download |
|
md5:5790e50ae0091c2d3f39e27a7e57eaa4
|
1.1 GB | Download |
|
md5:193a4e19b3813a43bf2dd26e7d4d564a
|
1.1 GB | Download |
|
md5:6f329da8efeb78dd346ead7843ebcfbb
|
1.1 GB | Download |
|
md5:6b63b5879cf4ddcfc0199bd12d23b7f7
|
1.1 GB | Download |
|
md5:887f2d21b70a050307b26e319e7519f9
|
1.1 GB | Download |
|
md5:9685475303cf0ddd2a481e875a52d15a
|
1.1 GB | Download |
|
md5:183c93d00695405c29ae5c64d8a0aa76
|
1.1 GB | Download |
|
md5:dc04e0593d00100daac20560e03c5b03
|
1.1 GB | Download |
|
md5:56679287f3ca026937e4fe1ceb8cc7d1
|
1.1 GB | Download |
|
md5:2967b468fc1e74eee98fdb04bf43cbbd
|
1.1 GB | Download |
|
md5:bec53ee2b1d80899c34697fb27ab02c7
|
1.1 GB | Download |
|
md5:f325d18300ab8e26d14e5c3cbdfc9c9b
|
1.1 GB | Download |
|
md5:611b8e5098899d08f21a460d47383a4a
|
1.1 GB | Download |
|
md5:e76b1787bfba6a490f6b224ebefdb02d
|
1.1 GB | Download |
|
md5:95360629230eabdfa7f791daa1abd273
|
515.0 MB | Download |
|
md5:826f5465e694cf140b7a48209d422620
|
7.1 kB | Download |
|
md5:8864201e5e8f85f9bb348ad1be636f17
|
2.7 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Publication: 10.5281/zenodo.17243812 (DOI)
Software
- Repository URL
- https://github.com/SemanticPriming/word2manylanguages
- Programming language
- Python , R