Published November 15, 2025 | Version v1.0.0
Dataset Open

English word2vec embeddings trained on OpenSubtitles Part 8

  • 1. ROR icon Harrisburg University of Science and Technology

Description

This dataset contains the subs2vec embeddings for English, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles

For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:

  • Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)

  • Window size: varying context windows (e.g., 2, 5, 10, …)

  • Each file corresponds to a unique configuration (dimension × window size). 

Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).

If you use this dataset, please cite:

sha256sums:

  • en_500_3_cbow_wxd.csv.bz2 19485a54b2249c0897da166814c90437ce4f49ef56e7b54c8ed1444161f29e3b
  • en_500_3_sg_wxd.csv.bz2 c298bc9468a71ec3c6e99f3cafe670cce91fe3e85d968fa5a9cc44769d7c5795
  • en_500_4_cbow_wxd.csv.bz2 ab46058fa7339f3305ee68b9fa10be2fb0469318d89fdb2e819c60d1681dcfd1

Files

README.md

Files (46.6 GB)

Name Size Download all
md5:2ac9871106fdabfeaf55d88edbdcc366
1.1 GB Download
md5:d4b97bdad1d8675997920f6b12083c92
1.1 GB Download
md5:dbc23fe48d82d11d9182ac5bb9647feb
1.1 GB Download
md5:d3bbdb0ec753407c9b99d8d1e1711c91
1.1 GB Download
md5:68b15d436d9497a81d3ad8107fd87d90
1.1 GB Download
md5:cf48f2161fd5f82fcba0df63accb08b5
1.1 GB Download
md5:128f389d91b9405cc61263abc0c1c8eb
1.1 GB Download
md5:2750a11fc14dd5116b773f4877a4470a
1.1 GB Download
md5:580420a41a6ff9b1325acee2bcacf88e
1.1 GB Download
md5:cffdc192b218b214643bc6bfa818b0c2
1.1 GB Download
md5:34bae60f6c8ef87b2c0bfbbda0566645
1.1 GB Download
md5:7ec0186c0668f49355f00776a19cb9bc
1.1 GB Download
md5:99cd45202f440b874b196659f6e5fb02
1.1 GB Download
md5:b797ef118f075401670db693bc0a279b
1.1 GB Download
md5:a7e751b27b8ec495a5ddeb7a3418eef5
507.7 MB Download
md5:fb69ffcf54650300ae6f4466fbc549f8
1.1 GB Download
md5:9184da31abbb1132877797cf124cb272
1.1 GB Download
md5:a4e742dc8c327bb199264cce59c25784
1.1 GB Download
md5:6a976faee00d0be6cd44d0a796382b45
1.1 GB Download
md5:fcfed114a7a9c444454fc50568dc7e72
1.1 GB Download
md5:9fb51521328d11d0d216ad9f61b0fe0e
1.1 GB Download
md5:35b5fdeca8f75cfe0a73804eb1fc072c
1.1 GB Download
md5:fd57e3dc1266e708ea8eeac374c1eb74
1.1 GB Download
md5:2a8ae21a59b1fc95b74542ecc7d06444
1.1 GB Download
md5:5899ca03ba64760828cb4e2260ecb8ba
1.1 GB Download
md5:ebf8737c3bd2e122fe49ee3df39e8b18
1.1 GB Download
md5:1d1d557e129452589f48c92048e14315
1.1 GB Download
md5:4954dc8e552cf4e6d9c86591e2bb215f
1.1 GB Download
md5:452726238668322898be4c552f26c8ec
1.1 GB Download
md5:13265982b71e0479de4e3d0cca89ee14
486.5 MB Download
md5:ea9043c6736552725146e4463e0273d9
1.1 GB Download
md5:dfc72faee7f316edd35953b36f4657bc
1.1 GB Download
md5:9382f3472a445a9f3fa42707c752cc0c
1.1 GB Download
md5:42e53a252d9ab5c3c60203554442ae96
1.1 GB Download
md5:0f4356311ca8c89947f0a173c6a9289c
1.1 GB Download
md5:a0a0beb32e69961a3726ed53af2231f1
1.1 GB Download
md5:6e43e7424d24943a6ca8b4be2012557b
1.1 GB Download
md5:a2caa3793ce1c0b4760ddcd46875a37d
1.1 GB Download
md5:0d81fe276fa14d807faf456ddd060681
1.1 GB Download
md5:bedac9aa1caa4dff9675710991e78f06
1.1 GB Download
md5:9e5209834068375a2bc0aafd5c88f750
1.1 GB Download
md5:69e451ed1917d1bf4fee0c5c85fbdb93
1.1 GB Download
md5:23a0e5792d229a744d72c03d20c26e29
1.1 GB Download
md5:82a59fd68393717d1e032a75ce00b4b9
1.1 GB Download
md5:054bb724edcb7ca34cad4404cd0f6814
490.5 MB Download
md5:826f5465e694cf140b7a48209d422620
7.1 kB Download
md5:8864201e5e8f85f9bb348ad1be636f17
2.7 kB Preview Download

Additional details

Related works

Is supplement to
Publication: 10.5281/zenodo.17243812 (DOI)

Software