Russian word2vec embeddings trained on OpenSubtitles Part 1
Description
This dataset contains the subs2vec embeddings for Russian, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles.
For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:
- Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
- Window size: varying context windows (e.g., 2, 5, 10, …)
- Each file corresponds to a unique configuration (dimension × window size).
Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).
If you use this dataset, please cite:
- Manuscript: https://doi.org/10.5281/zenodo.17243812
- Data: This Zenodo dataset (using the DOI provided here)
Files
Files
(48.8 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:f9cdea06ce0ef8ac5e4de6e59280b387
|
1.0 GB | Download |
|
md5:3a9843f00d85a8ac4b890faa1df36e4b
|
1.0 GB | Download |
|
md5:a82acf0c2a5f0da241c758b876164051
|
1.0 GB | Download |
|
md5:23b5aeeda5cbbe913da53e3ff9d41738
|
1.0 GB | Download |
|
md5:274cd62562242754f5d583cf5421f61c
|
1.0 GB | Download |
|
md5:a606afccd716123c148cfa31e22a260f
|
1.0 GB | Download |
|
md5:0fbb1a87e717a18b0c525d91d57ba5cf
|
1.0 GB | Download |
|
md5:b708412c0e436c0c94af8cd05193b4c1
|
1.0 GB | Download |
|
md5:0e8aa447cc491448db64f961a642b608
|
1.0 GB | Download |
|
md5:9b4328ad029e56cea10248a25a6ec606
|
1.0 GB | Download |
|
md5:557ec51138b82ff5f9ec808538f70092
|
1.0 GB | Download |
|
md5:95a10e309f052c49bc84946c54d86dc1
|
1.0 GB | Download |
|
md5:2c01dddb94a95c319151e96c4cbb6448
|
2.0 GB | Download |
|
md5:e2e13ad1e530fbc098fbebdbe21ac355
|
2.0 GB | Download |
|
md5:6197a71f034dab397f985cb05dd00b4b
|
2.0 GB | Download |
|
md5:0fca7ec8379a9d29620ab31170997e60
|
2.0 GB | Download |
|
md5:21d481a9bfbbd3732ed1698e1a9f759c
|
2.0 GB | Download |
|
md5:002aec01cd197e2cec1ec718eb648f64
|
2.0 GB | Download |
|
md5:97a4f8fbddddedece26000cb523aa148
|
2.0 GB | Download |
|
md5:f78e271991c302ff35a1cdfd41bdc7f6
|
2.0 GB | Download |
|
md5:e86f2ffd3170af29364b534be5bd3656
|
2.0 GB | Download |
|
md5:b213be934dec3ba49b5158e7bc3e1cca
|
2.0 GB | Download |
|
md5:20b7ad07f7cec24180a0b07e8085ef7a
|
2.0 GB | Download |
|
md5:ab2309d07f2d1f8f0ca1a29bd8700a5c
|
2.0 GB | Download |
|
md5:23aa388bb60b6c3097d2ea29bbb1ef4d
|
3.0 GB | Download |
|
md5:d362aad6423236c21c0b9074f66bdca1
|
3.0 GB | Download |
|
md5:583386d36023e50c76c1551aa75848c7
|
516.8 MB | Download |
|
md5:b5002639d607a9cebd7432a4b2765551
|
517.1 MB | Download |
|
md5:76e9491c0cc66e0cbc8d95f15b8747f1
|
516.4 MB | Download |
|
md5:1ce28d4096251e99dbc1446583c040ad
|
517.5 MB | Download |
|
md5:d25b1c23b28f137337ce5a48805ad939
|
516.9 MB | Download |
|
md5:2327deefecc632ed7f06b80229975113
|
518.0 MB | Download |
|
md5:4be20489b1130deffdf96826e57dbd6e
|
517.3 MB | Download |
|
md5:45f279a1e0418cf5608aaa38c9f46682
|
518.7 MB | Download |
|
md5:db68eac56731f539f73b6b5bcb323570
|
518.0 MB | Download |
|
md5:58cd5f2911869ab8d6faf9faf1a3dbd5
|
518.3 MB | Download |
|
md5:565f3092435f66e1469b6959875a75d4
|
518.4 MB | Download |
|
md5:7f0d6ecef7a2c7e706c916793724a11f
|
518.9 MB | Download |
Additional details
Related works
- Is supplement to
- Standard: 10.5281/zenodo.17243812 (DOI)
Software
- Repository URL
- https://github.com/SemanticPriming/word2manylanguages
- Programming language
- Python , R