Published October 24, 2025 | Version v1.0.0
Dataset Open

Bulgarian word2vec embeddings trained on OpenSubtitles

  • 1. ROR icon Harrisburg University of Science and Technology

Description

This dataset contains the subs2vec embeddings for Bulgarian, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles

For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:

  • Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
  • Window size: varying context windows (e.g., 2, 5, 10, …)
  • Each file corresponds to a unique configuration (dimension × window size). 

Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1). 

If you use this dataset, please cite:

Files

Files (40.9 GB)

Name Size Download all
md5:fb6ab4d81edb6ed69214b0413467b398
297.8 MB Download
md5:73e7f3ca7cefff0c7e9bfd29fa3c51e1
298.0 MB Download
md5:fc81a47b3440c3be402f355261e9b07b
297.1 MB Download
md5:a8216cac0ffcd87fe08f6ee971482a17
298.0 MB Download
md5:41f9867c6684c61eab4338bf65943eaf
297.0 MB Download
md5:c18e2bffd11a5ccdda870a6e9c092823
298.1 MB Download
md5:bb23155ecf52dbd5a9e28f00ae0ff89d
296.5 MB Download
md5:a919f3a2c437f86b45acf1ad2e290bac
298.2 MB Download
md5:7bd93f70840fd5c2694a0bc216cce4d2
296.5 MB Download
md5:bf4e187a293328071545b33406a56229
298.1 MB Download
md5:6027f31b81b39f9b672c03408af15009
296.3 MB Download
md5:86acf72c57f5164087b50e68266fcaa5
298.2 MB Download
md5:1a1b8b558bcd4b5f8e2611acf12e8ed6
592.0 MB Download
md5:23d9bfa3f2fd2d4c552732978c1f3a6f
592.8 MB Download
md5:998c6b49c47907f668b475dffcee11ff
591.6 MB Download
md5:6b601a62f4d0308ab00d86a3c7270050
593.0 MB Download
md5:35dc4e9e77542445b422c814a378a690
591.2 MB Download
md5:9629163c18015e5a45e41cc831400f05
593.2 MB Download
md5:6caee20ef7614146e8bdd6936e49fa54
591.3 MB Download
md5:23d6cc91228bfd4f5a1a121d2e6a08f7
593.3 MB Download
md5:34a2697bb6a6109b62a0881d2e37bb28
591.0 MB Download
md5:4137e8b0f8494db3c7d17212aaa20121
593.3 MB Download
md5:bae51300107e882a8f6c134b30b9cd6c
591.0 MB Download
md5:dde088637268dfd26c6c50917db241bb
593.4 MB Download
md5:4ea345c10dda2af0d41e0895947b43b4
889.2 MB Download
md5:23e704da9ceaf2090a21ddce57a3fac1
888.5 MB Download
md5:0cb1dba42641c7517fe271deafa96cb0
886.8 MB Download
md5:0a7ca05b8d2902afd941eff09f2fbf12
888.6 MB Download
md5:487be75a1a53edd3cbd577868bf66d45
886.0 MB Download
md5:031a456c1738e238862c79d984e881cf
889.0 MB Download
md5:a919043aedfeb6b9fad133463881a424
885.6 MB Download
md5:db2b240a533366c550a089858f451f36
889.1 MB Download
md5:913a4ceeaccec8d7e72dac6d39c74cd6
885.1 MB Download
md5:e738ed7e8feab6cd9d27c627287b24e7
889.0 MB Download
md5:2680b702a0c8089e24c8278a90370304
885.3 MB Download
md5:64727d5b1927390f4cc646e8933b5cc3
889.0 MB Download
md5:2db3e786587475b428327bf1f07f6205
1.5 GB Download
md5:6cbe7fd54e05db9c3f423f735a8502d2
1.5 GB Download
md5:455d34aae07cd603ae663cdffdc757ee
1.5 GB Download
md5:7532b132730ee817ee1777844ee2915a
1.5 GB Download
md5:866a6a4eb67d8816d9dea97137a3774c
1.5 GB Download
md5:8ce798e94547d3162c1be731ad038688
1.5 GB Download
md5:6476dcc2948a491afd0db369119c8c0b
1.5 GB Download
md5:a6f7e8f8184419b082d413b606e16096
1.5 GB Download
md5:8568172631a7e31a81d70368e992d395
1.5 GB Download
md5:40aed4275807871430b29e9d8b54d4b2
1.5 GB Download
md5:e912dca8a58c9900bba4eb0df7406aef
1.5 GB Download
md5:ba1125807b11cf8ea1502499283c8a7f
1.5 GB Download
md5:f3c476d535a49f9e823f3984c6c0d6b4
150.8 MB Download
md5:aabbf08007619a8841ca3f226e92d5c2
151.1 MB Download
md5:330b828a1acb9f1329d637323fd84a0b
150.8 MB Download
md5:a335808d17736150efbb0ddf3be0514f
151.0 MB Download
md5:d1c6190bb0e81b2037c382355fbf7c52
150.6 MB Download
md5:92280bc29fd9c8df47b1b0430953b42e
150.9 MB Download
md5:ffe61f5a9b7d3078cd342702fb1b79bb
150.6 MB Download
md5:3a52d56a45478449b6603ba3412ac10b
151.0 MB Download
md5:ee2e7b7289c590c73b2474e3d0478a57
150.6 MB Download
md5:230f14d4f1234cb7dcd5c9d088493772
151.0 MB Download
md5:3ae934d3c1d7195420cd672a0a798a07
150.6 MB Download
md5:05ed41696a00f451ff22be9068d200b7
150.9 MB Download

Additional details

Related works

Is supplement to
Standard: 10.5281/zenodo.17243812 (DOI)

Software