Georgian word2vec embeddings trained on OpenSubtitles
Description
This dataset contains the subs2vec embeddings for Georgian, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles.
For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:
- Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
- Window size: varying context windows (e.g., 2, 5, 10, …)
- Each file corresponds to a unique configuration (dimension × window size).
Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).
If you use this dataset, please cite:
- Manuscript: https://doi.org/10.5281/zenodo.17243812
- Data: This Zenodo dataset (using the DOI provided here)
Files
Files
(13.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:c962001cf0146e0fe75c8d015fbbaae3
|
100.9 MB | Download |
|
md5:dd31368e30f84740bc8d4a28d0368418
|
100.9 MB | Download |
|
md5:98dc539f6cb9a88312dcd8009439075e
|
100.9 MB | Download |
|
md5:74e8e7bd93387cfeea70073590d050fb
|
100.9 MB | Download |
|
md5:00f7ede5e2a72661c549c418a4e3bda9
|
100.9 MB | Download |
|
md5:4ed2438b809f85ab5390b500fe023d38
|
100.9 MB | Download |
|
md5:948ee765760a49537594868e8fe67452
|
100.8 MB | Download |
|
md5:bfaf37e098162b7aaf0aca108029fb21
|
100.9 MB | Download |
|
md5:0c9d3e52892b68ba8263bd30c171b4cb
|
100.9 MB | Download |
|
md5:bc00abffb82aa5900526c946edd3da77
|
100.9 MB | Download |
|
md5:b6051349aa3691a8b2267db92f3017a3
|
100.8 MB | Download |
|
md5:384914a886a26a905bf962ef99be5fc8
|
100.9 MB | Download |
|
md5:e588513c64a172cd5c46540761ebf25b
|
200.7 MB | Download |
|
md5:75c0bedf0c52c9f70b5d11f3046c6360
|
201.0 MB | Download |
|
md5:19fd90d28a4e13cfcfe366497d245b2e
|
200.6 MB | Download |
|
md5:fde606fd0c41f5c2590c47ae076796f5
|
201.0 MB | Download |
|
md5:276baf7e420d17eb32db2f4eb0d67b43
|
200.6 MB | Download |
|
md5:fbb3e35a21b680ba7424634e97440d3f
|
201.0 MB | Download |
|
md5:5e1ef075eeb83e0365a9ccf6dda37367
|
200.6 MB | Download |
|
md5:483b82eeefe0d34455e34711ecd5602f
|
201.0 MB | Download |
|
md5:5f8007d96b87c16adcf64975d1de648d
|
200.6 MB | Download |
|
md5:7b6314659195e6e678b4c2c46809e3f7
|
200.9 MB | Download |
|
md5:035208e0710135156aae039ffae13d3d
|
200.6 MB | Download |
|
md5:2bd79d7a6ea4bb4a25c72d9af069534e
|
201.0 MB | Download |
|
md5:02966e70c74bfab54f3f9d9e0d992e5e
|
300.7 MB | Download |
|
md5:10fb466d5e11e33fc4794f954768249b
|
301.2 MB | Download |
|
md5:ac774f86b7d7f8731a07277b105d106c
|
300.6 MB | Download |
|
md5:df4f4e5477a33400f93cb1b93b1b4ced
|
301.3 MB | Download |
|
md5:ee564a234d36aef2453179f1a4e77d30
|
300.5 MB | Download |
|
md5:8f2ab20006ad73c38676c814df5a506d
|
301.2 MB | Download |
|
md5:2646fde3b5c719f6db618f50cbbb22ea
|
300.5 MB | Download |
|
md5:52bdf93bf6da005a28899f0361865bc7
|
301.3 MB | Download |
|
md5:3f7c33bbd27f58d244e79eafc84011c1
|
300.5 MB | Download |
|
md5:3d707d8b54a915bda2db3bc0b93c4066
|
301.4 MB | Download |
|
md5:2bebc879530f64d82717cf25a6b8c34b
|
300.5 MB | Download |
|
md5:8eb797796c0a23b883808d0437cdc729
|
301.3 MB | Download |
|
md5:e90a9680662f6f14656ed3575c305075
|
501.7 MB | Download |
|
md5:02f39ff642490828dcb02a86374709fd
|
502.8 MB | Download |
|
md5:72b4ef27875355586caa2d10f07ab810
|
500.9 MB | Download |
|
md5:dd77a01080a52f418ed39c9677ffb21a
|
502.7 MB | Download |
|
md5:90c4ad0ab17ce272f13b793158eeeffb
|
500.7 MB | Download |
|
md5:bae5d690bb5f4fd367cd3a368416fbd1
|
502.8 MB | Download |
|
md5:d4e2dd06f36f18573be60ecf5471f37f
|
500.8 MB | Download |
|
md5:65b800e0eeeb6e5634805d465643ab86
|
502.6 MB | Download |
|
md5:2565e985bc005632f2d5cea922201087
|
500.5 MB | Download |
|
md5:97df06881b515133fe2609f17f5c0dd5
|
502.5 MB | Download |
|
md5:676b832ee85b562ab3b2fbc8204f8f90
|
500.5 MB | Download |
|
md5:b813117712582be183a8361d9ebce45b
|
502.5 MB | Download |
|
md5:e1d401989278973c20e5a63d4a7d642f
|
51.1 MB | Download |
|
md5:c1c2267f9b9931ccd87c094ef82d6128
|
51.0 MB | Download |
|
md5:b08b8a03eccbc0c227cebd3602488eb1
|
51.0 MB | Download |
|
md5:4e577466131727002620af550ee9fd50
|
51.0 MB | Download |
|
md5:5660fd5b8c049ea943d48903dec126d0
|
51.1 MB | Download |
|
md5:7d341d1347fb717769aebc1a48aa54a0
|
51.1 MB | Download |
|
md5:5cb6302aae10d65c8d97af505eef7059
|
51.0 MB | Download |
|
md5:04ec350502c4c1a53f698042a521defe
|
51.0 MB | Download |
|
md5:98f01eb9b26d462995436d25ab907192
|
51.0 MB | Download |
|
md5:bf150bf775a2cd41be667424e0b07aae
|
51.0 MB | Download |
|
md5:92feb192ca723ad0cd622fdc3dd5319f
|
51.1 MB | Download |
|
md5:e352a8c12c2f67e7489f70c45a38a608
|
51.0 MB | Download |
Additional details
Related works
- Is supplement to
- Standard: 10.5281/zenodo.17243812 (DOI)
Software
- Repository URL
- https://github.com/SemanticPriming/word2manylanguages
- Programming language
- Python , R