Published November 15, 2025 | Version v1.0.0
Dataset Open

English word2vec embeddings trained on OpenSubtitles Part 6

  • 1. ROR icon Harrisburg University of Science and Technology

Description

This dataset contains the subs2vec embeddings for English, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles

For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:

  • Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)

  • Window size: varying context windows (e.g., 2, 5, 10, …)

  • Each file corresponds to a unique configuration (dimension × window size). 

Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).

If you use this dataset, please cite:

sha256sums:

  • en_300_6_cbow_wxd.csv.bz2 72a94830d81ebbe28e7fa78465e02ad2bd7771ef5414f8f30f6de94565050167
  • en_300_6_sg_wxd.csv.bz2 b0d7db822f181a124758e55b0c33b47e0a249c37f8a778806f6166a5baf96cb3
  • en_500_1_cbow_wxd.csv.bz2 4196d84670045dc3cb65195f4045e543c78c1c35d28530b6c24e7711b8cbf23b

Files

README.md

Files (34.2 GB)

Name Size Download all
md5:0f8b4d5cb647029bb63b74a63eb9f9e7
1.1 GB Download
md5:6d32afc9d8e06621e618fd7a1cd959b8
1.1 GB Download
md5:54786e7ef540247c5894a4f7df18ce7b
1.1 GB Download
md5:9af908c91bfaeb85c72c740e8ff37aed
1.1 GB Download
md5:1c34014d61e2fad1cee329870280732c
1.1 GB Download
md5:43f4c6e65af39662a5c35ba91df1afca
1.1 GB Download
md5:ae155f92c575eb1d9b851fd2c1c9ae59
1.1 GB Download
md5:1d17e86a324a3f405ada39b0bb325268
1.1 GB Download
md5:8db28826d6b0fb4760b1be58e298e04f
767.0 MB Download
md5:b1e14f00e8a4094d14af3a3567573d86
1.1 GB Download
md5:eb9381b14109bf5eb1a46975a86405e2
1.1 GB Download
md5:d7ea5450d8c820bca194a6709be0426d
1.1 GB Download
md5:0702f20ee8e8bcd0fa46fc451ad931cb
1.1 GB Download
md5:af3146314b32780f47659df8d3e75caf
1.1 GB Download
md5:4271252da0e1a79431230a93c8e74516
1.1 GB Download
md5:68ed622610da52e8e9ca39eed5dd313d
1.1 GB Download
md5:65f79415cfc6d0f7a070befc9ec56467
1.1 GB Download
md5:c5836c20c9f904192b22821a5a2fe760
750.0 MB Download
md5:5790e50ae0091c2d3f39e27a7e57eaa4
1.1 GB Download
md5:193a4e19b3813a43bf2dd26e7d4d564a
1.1 GB Download
md5:6f329da8efeb78dd346ead7843ebcfbb
1.1 GB Download
md5:6b63b5879cf4ddcfc0199bd12d23b7f7
1.1 GB Download
md5:887f2d21b70a050307b26e319e7519f9
1.1 GB Download
md5:9685475303cf0ddd2a481e875a52d15a
1.1 GB Download
md5:183c93d00695405c29ae5c64d8a0aa76
1.1 GB Download
md5:dc04e0593d00100daac20560e03c5b03
1.1 GB Download
md5:56679287f3ca026937e4fe1ceb8cc7d1
1.1 GB Download
md5:2967b468fc1e74eee98fdb04bf43cbbd
1.1 GB Download
md5:bec53ee2b1d80899c34697fb27ab02c7
1.1 GB Download
md5:f325d18300ab8e26d14e5c3cbdfc9c9b
1.1 GB Download
md5:611b8e5098899d08f21a460d47383a4a
1.1 GB Download
md5:e76b1787bfba6a490f6b224ebefdb02d
1.1 GB Download
md5:95360629230eabdfa7f791daa1abd273
515.0 MB Download
md5:826f5465e694cf140b7a48209d422620
7.1 kB Download
md5:8864201e5e8f85f9bb348ad1be636f17
2.7 kB Preview Download

Additional details

Related works

Is supplement to
Publication: 10.5281/zenodo.17243812 (DOI)

Software