Lithuanian word2vec embeddings trained on OpenSubtitles
Description
This dataset contains the subs2vec embeddings for Lithuanian, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles.
For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:
- Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
- Window size: varying context windows (e.g., 2, 5, 10, …)
- Each file corresponds to a unique configuration (dimension × window size).
Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).
If you use this dataset, please cite:
- Manuscript: https://doi.org/10.5281/zenodo.17243812
- Data: This Zenodo dataset (using the DOI provided here)
Files
Files
(19.6 GB)
| Name | Size | |
|---|---|---|
|
md5:bd6c18cad7888331f6f9c4f2493355f6
|
142.8 MB | Download |
|
md5:13ece037efe65a20cd4b37b56fc9927a
|
142.9 MB | Download |
|
md5:201f6d759cff9fa18b6cd6b38d631543
|
142.8 MB | Download |
|
md5:11d4daf8bd9e11e07855a0f4514068e3
|
143.0 MB | Download |
|
md5:edf503e9d5b45057fe4367b52e777c2b
|
142.7 MB | Download |
|
md5:c48c86c5dab68635d0d8b56d2c7d92f8
|
142.9 MB | Download |
|
md5:a240ece58841dc1a0bdb756cc7e37133
|
142.7 MB | Download |
|
md5:9b6fe8dfb30b395c4de7359eba956114
|
142.8 MB | Download |
|
md5:b8ab1a9a745f9ad05a9cbf36cc543422
|
142.6 MB | Download |
|
md5:1c76c171816bbbec808c727cf5ffa1b0
|
142.9 MB | Download |
|
md5:fcfc081da710ac26a9da5fd0617bbe66
|
142.6 MB | Download |
|
md5:8b9a677cb0c3d01a44e8a3bda7dd1905
|
142.9 MB | Download |
|
md5:6c763c8c0ad3d8a2304383b264e11536
|
283.9 MB | Download |
|
md5:b3e0207677b56d219622007320765035
|
284.3 MB | Download |
|
md5:592bd9b59993e04e07a1023d15828ccd
|
283.8 MB | Download |
|
md5:b5d94c74fa5f45807280cdf9a80311bb
|
284.3 MB | Download |
|
md5:2699c14c845e63f6539e28793eb138d8
|
283.8 MB | Download |
|
md5:1c26828bfdc989caba14b6d6b66be61b
|
284.4 MB | Download |
|
md5:fabfd0165921c22d646ee769bd10ec06
|
283.6 MB | Download |
|
md5:834212c12cc263145d85a77445d59c20
|
284.4 MB | Download |
|
md5:ed2e7b4251c156dad69379b2797ff43d
|
283.6 MB | Download |
|
md5:672f1c1f7d34d0d50e16d7874fd7c749
|
284.4 MB | Download |
|
md5:3d7ea2209e68aa6fd81ea8d11f0dfe14
|
283.5 MB | Download |
|
md5:791946634723c47675ef127366c8c6d3
|
284.4 MB | Download |
|
md5:d79ddd5d8d53f0fd5b90b17553f6712f
|
425.4 MB | Download |
|
md5:62bc562e65bd3a85ccb8de975c6859d1
|
426.2 MB | Download |
|
md5:0c956ce9f63473effe881b6ade7f1d10
|
425.1 MB | Download |
|
md5:26f6de39d9decfed7901290eb02e5a31
|
426.3 MB | Download |
|
md5:e590dfcc341b51d7242df8ac18e0f215
|
425.0 MB | Download |
|
md5:4de0434cebe5a7856361ab10a00412e2
|
426.4 MB | Download |
|
md5:f15d9dbf3fd4f1add86d3d6e19fd0cc0
|
424.9 MB | Download |
|
md5:8484c1326f65b9d0ea01635fbe84450a
|
426.3 MB | Download |
|
md5:e5c35aedbf9b409a8cdf5823aba83e30
|
424.8 MB | Download |
|
md5:5b5fcb42f9ea632970ff0d2b75c8839d
|
426.3 MB | Download |
|
md5:d4711d61e2a4af7dc5a8075494d96e39
|
424.6 MB | Download |
|
md5:2fb11fa1cd440651559df501ddc2fc6a
|
426.3 MB | Download |
|
md5:c477b1efcef32f0eecd438b83494eac6
|
709.3 MB | Download |
|
md5:42f550d7961dbf0ab392a049b324c941
|
711.3 MB | Download |
|
md5:6b9d62488369a11acfaa431a4e9e9f6a
|
708.3 MB | Download |
|
md5:ff55b79a535fb9d2239f85c7914aea07
|
711.2 MB | Download |
|
md5:6177b2a1b142422abc527314e43dcf24
|
707.8 MB | Download |
|
md5:6aef2c07e32babb0a12129f789341f94
|
711.1 MB | Download |
|
md5:bd64855d77b6d2c729952ee6b2381316
|
707.7 MB | Download |
|
md5:c467701de4ac8fd1fb490d53da3f681b
|
711.1 MB | Download |
|
md5:94d72e3b1704603e72411181d3e586fe
|
707.3 MB | Download |
|
md5:5d29d9f5eca3e0855d555932cf0c2eb5
|
710.9 MB | Download |
|
md5:170cb96db7119d84d2ea4e4660ab5cab
|
707.3 MB | Download |
|
md5:0d60ba4544902fa9a7b14faac9003661
|
710.8 MB | Download |
|
md5:2862782e0f4e6c7bf1d0fc0754fc3ca5
|
72.3 MB | Download |
|
md5:5d394b7dfa71e730e36ce62973d2bd0c
|
72.4 MB | Download |
|
md5:117d40e22744075b084a9d115fabcfe4
|
72.2 MB | Download |
|
md5:9cfc5ca5ce7b31cb8a23462260805f4c
|
72.4 MB | Download |
|
md5:b9684b4ee2c96c5772bb85eb978acf60
|
72.3 MB | Download |
|
md5:e964b0b409a7b4ed601ee7c583ee2cea
|
72.4 MB | Download |
|
md5:86feba1b55e834ea230c32fff1a8696e
|
72.3 MB | Download |
|
md5:bdf94a4e5a4941e9b05ef1500928757f
|
72.4 MB | Download |
|
md5:8940cab748e40602e88496a88299728f
|
72.2 MB | Download |
|
md5:48dfa76b2d896b8212382359797dfed3
|
72.3 MB | Download |
|
md5:658eaf5b636fdee17ed9debf06d17811
|
72.3 MB | Download |
|
md5:93a8a66d84504a2f41d2cd7f45eff874
|
72.3 MB | Download |
Additional details
Related works
- Is supplement to
- Standard: 10.5281/zenodo.17243812 (DOI)
Software
- Repository URL
- https://github.com/SemanticPriming/word2manylanguages
- Programming language
- Python , R