Bulgarian word2vec embeddings trained on OpenSubtitles
Description
This dataset contains the subs2vec embeddings for Bulgarian, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles.
For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:
- Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
- Window size: varying context windows (e.g., 2, 5, 10, …)
- Each file corresponds to a unique configuration (dimension × window size).
Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).
If you use this dataset, please cite:
- Manuscript: https://doi.org/10.5281/zenodo.17243812
- Data: This Zenodo dataset (using the DOI provided here)
Files
Files
(40.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:fb6ab4d81edb6ed69214b0413467b398
|
297.8 MB | Download |
|
md5:73e7f3ca7cefff0c7e9bfd29fa3c51e1
|
298.0 MB | Download |
|
md5:fc81a47b3440c3be402f355261e9b07b
|
297.1 MB | Download |
|
md5:a8216cac0ffcd87fe08f6ee971482a17
|
298.0 MB | Download |
|
md5:41f9867c6684c61eab4338bf65943eaf
|
297.0 MB | Download |
|
md5:c18e2bffd11a5ccdda870a6e9c092823
|
298.1 MB | Download |
|
md5:bb23155ecf52dbd5a9e28f00ae0ff89d
|
296.5 MB | Download |
|
md5:a919f3a2c437f86b45acf1ad2e290bac
|
298.2 MB | Download |
|
md5:7bd93f70840fd5c2694a0bc216cce4d2
|
296.5 MB | Download |
|
md5:bf4e187a293328071545b33406a56229
|
298.1 MB | Download |
|
md5:6027f31b81b39f9b672c03408af15009
|
296.3 MB | Download |
|
md5:86acf72c57f5164087b50e68266fcaa5
|
298.2 MB | Download |
|
md5:1a1b8b558bcd4b5f8e2611acf12e8ed6
|
592.0 MB | Download |
|
md5:23d9bfa3f2fd2d4c552732978c1f3a6f
|
592.8 MB | Download |
|
md5:998c6b49c47907f668b475dffcee11ff
|
591.6 MB | Download |
|
md5:6b601a62f4d0308ab00d86a3c7270050
|
593.0 MB | Download |
|
md5:35dc4e9e77542445b422c814a378a690
|
591.2 MB | Download |
|
md5:9629163c18015e5a45e41cc831400f05
|
593.2 MB | Download |
|
md5:6caee20ef7614146e8bdd6936e49fa54
|
591.3 MB | Download |
|
md5:23d6cc91228bfd4f5a1a121d2e6a08f7
|
593.3 MB | Download |
|
md5:34a2697bb6a6109b62a0881d2e37bb28
|
591.0 MB | Download |
|
md5:4137e8b0f8494db3c7d17212aaa20121
|
593.3 MB | Download |
|
md5:bae51300107e882a8f6c134b30b9cd6c
|
591.0 MB | Download |
|
md5:dde088637268dfd26c6c50917db241bb
|
593.4 MB | Download |
|
md5:4ea345c10dda2af0d41e0895947b43b4
|
889.2 MB | Download |
|
md5:23e704da9ceaf2090a21ddce57a3fac1
|
888.5 MB | Download |
|
md5:0cb1dba42641c7517fe271deafa96cb0
|
886.8 MB | Download |
|
md5:0a7ca05b8d2902afd941eff09f2fbf12
|
888.6 MB | Download |
|
md5:487be75a1a53edd3cbd577868bf66d45
|
886.0 MB | Download |
|
md5:031a456c1738e238862c79d984e881cf
|
889.0 MB | Download |
|
md5:a919043aedfeb6b9fad133463881a424
|
885.6 MB | Download |
|
md5:db2b240a533366c550a089858f451f36
|
889.1 MB | Download |
|
md5:913a4ceeaccec8d7e72dac6d39c74cd6
|
885.1 MB | Download |
|
md5:e738ed7e8feab6cd9d27c627287b24e7
|
889.0 MB | Download |
|
md5:2680b702a0c8089e24c8278a90370304
|
885.3 MB | Download |
|
md5:64727d5b1927390f4cc646e8933b5cc3
|
889.0 MB | Download |
|
md5:2db3e786587475b428327bf1f07f6205
|
1.5 GB | Download |
|
md5:6cbe7fd54e05db9c3f423f735a8502d2
|
1.5 GB | Download |
|
md5:455d34aae07cd603ae663cdffdc757ee
|
1.5 GB | Download |
|
md5:7532b132730ee817ee1777844ee2915a
|
1.5 GB | Download |
|
md5:866a6a4eb67d8816d9dea97137a3774c
|
1.5 GB | Download |
|
md5:8ce798e94547d3162c1be731ad038688
|
1.5 GB | Download |
|
md5:6476dcc2948a491afd0db369119c8c0b
|
1.5 GB | Download |
|
md5:a6f7e8f8184419b082d413b606e16096
|
1.5 GB | Download |
|
md5:8568172631a7e31a81d70368e992d395
|
1.5 GB | Download |
|
md5:40aed4275807871430b29e9d8b54d4b2
|
1.5 GB | Download |
|
md5:e912dca8a58c9900bba4eb0df7406aef
|
1.5 GB | Download |
|
md5:ba1125807b11cf8ea1502499283c8a7f
|
1.5 GB | Download |
|
md5:f3c476d535a49f9e823f3984c6c0d6b4
|
150.8 MB | Download |
|
md5:aabbf08007619a8841ca3f226e92d5c2
|
151.1 MB | Download |
|
md5:330b828a1acb9f1329d637323fd84a0b
|
150.8 MB | Download |
|
md5:a335808d17736150efbb0ddf3be0514f
|
151.0 MB | Download |
|
md5:d1c6190bb0e81b2037c382355fbf7c52
|
150.6 MB | Download |
|
md5:92280bc29fd9c8df47b1b0430953b42e
|
150.9 MB | Download |
|
md5:ffe61f5a9b7d3078cd342702fb1b79bb
|
150.6 MB | Download |
|
md5:3a52d56a45478449b6603ba3412ac10b
|
151.0 MB | Download |
|
md5:ee2e7b7289c590c73b2474e3d0478a57
|
150.6 MB | Download |
|
md5:230f14d4f1234cb7dcd5c9d088493772
|
151.0 MB | Download |
|
md5:3ae934d3c1d7195420cd672a0a798a07
|
150.6 MB | Download |
|
md5:05ed41696a00f451ff22be9068d200b7
|
150.9 MB | Download |
Additional details
Related works
- Is supplement to
- Standard: 10.5281/zenodo.17243812 (DOI)
Software
- Repository URL
- https://github.com/SemanticPriming/word2manylanguages
- Programming language
- Python , R