Arabic word2vec embeddings trained on OpenSubtitles Part 1
Description
This dataset contains the subs2vec embeddings for Arabic, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles.
For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:
- Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
- Window size: varying context windows (e.g., 2, 5, 10, …)
- Each file corresponds to a unique configuration (dimension × window size).
Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).
If you use this dataset, please cite:
- Manuscript: https://doi.org/10.5281/zenodo.17243812
- Data: This Zenodo dataset (using the DOI provided here)
Files
Files
(49.7 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:94cd3796dc115bea1ebe501a643ac55c
|
721.2 MB | Download |
|
md5:ba52817503d051cf75127d0b96f70e8c
|
723.1 MB | Download |
|
md5:6247002e0a36bc6fd27f4961bbc92dde
|
720.7 MB | Download |
|
md5:d66687af1353bb614fda89331260a063
|
723.1 MB | Download |
|
md5:db6d61131a1bdd355bf273761aad3e3c
|
720.6 MB | Download |
|
md5:547e54267e576581f8d26605a5addd20
|
723.5 MB | Download |
|
md5:c339516616cd1f8b6e3fa62383ec96aa
|
721.0 MB | Download |
|
md5:fedd4b41e860171b287fd614c6319655
|
723.9 MB | Download |
|
md5:66e75426a581c058afbdb2ca41a90bd7
|
721.8 MB | Download |
|
md5:8dd5a277e70264daa0f5d7a8259b30a1
|
723.0 MB | Download |
|
md5:1f9430492de0b110011aeb856919dcb5
|
721.8 MB | Download |
|
md5:22b8b3fd2ce8b543fb43936b46a768f1
|
723.9 MB | Download |
|
md5:88a749954080fba024a25483a3de763b
|
1.4 GB | Download |
|
md5:c13ee883e5505995575691a96ced5d92
|
1.4 GB | Download |
|
md5:4ba1b19617dff57fc633b53507b91c7f
|
1.4 GB | Download |
|
md5:c5022a4f6526ca42be2e393492da5b0f
|
1.4 GB | Download |
|
md5:5c5f5fca04b3b2d82e9c151b1ce979ae
|
1.4 GB | Download |
|
md5:4699cd017eaf676211c609463c32944e
|
1.4 GB | Download |
|
md5:1c9f08819a413d11911855d24e9d0b38
|
1.4 GB | Download |
|
md5:01961aa338c1626d3f4275b973711df0
|
1.4 GB | Download |
|
md5:689f9f0db5ddcf7b19eb48c001edbcf0
|
1.4 GB | Download |
|
md5:bd8f0d0679b1a8f860320797cd30ce0c
|
1.4 GB | Download |
|
md5:6dc01cd380a050fcf53fd452f2da7073
|
1.4 GB | Download |
|
md5:62a66b9dc66d7bd78af4b30f319dec61
|
1.4 GB | Download |
|
md5:c6e625e48276b94160501b4bea43b3ba
|
2.2 GB | Download |
|
md5:e170b3ebc0ac05336f3268e0c76196a3
|
2.2 GB | Download |
|
md5:3332f7916f9369fc9299ef800781ecf0
|
2.2 GB | Download |
|
md5:a82a668ae7df2d572371b28f4431c883
|
2.2 GB | Download |
|
md5:941834297a9bf87090bf5c227ff3c914
|
2.2 GB | Download |
|
md5:0497a383055351d34f2815acf94d639f
|
2.2 GB | Download |
|
md5:ab2256a79b406b3e47e993e152e1bd04
|
2.2 GB | Download |
|
md5:91f3ad9281b6f1e9bec883a42a4a0231
|
2.2 GB | Download |
|
md5:e58e374f834e7a2deb1f6e36f5b5f4ef
|
2.2 GB | Download |
|
md5:1e11f2eff8efe632c02a6c3f96a7cbe0
|
365.4 MB | Download |
|
md5:1b7fb82c84d440b5c11a0ce7f5c0d264
|
366.5 MB | Download |
|
md5:445b82bc18cdcd81667d42283f92f450
|
365.7 MB | Download |
|
md5:5d4b3947e496bc00b541b132537adde6
|
366.5 MB | Download |
|
md5:5b295546ad31b5b37743f05bbb38b7ed
|
365.8 MB | Download |
|
md5:919978954204e01839fe699a135bcaa1
|
366.8 MB | Download |
|
md5:04b2107d3456b5906564e7610fd1fc58
|
365.7 MB | Download |
|
md5:abb7c84abd5288d8061e94417b2a63d5
|
366.6 MB | Download |
|
md5:e8b161d3413d8ebee4c6e7214edb5a1a
|
365.7 MB | Download |
|
md5:fd9db8d2415ff6853ce91cc89343c33c
|
366.6 MB | Download |
|
md5:6924cc6f9e7d089b92a32d43139f8e1f
|
366.0 MB | Download |
|
md5:8ea65d89e0a6935a2578a7cddb3582eb
|
366.4 MB | Download |
Additional details
Related works
- Is supplement to
- Standard: 10.5281/zenodo.17243812 (DOI)
Software
- Repository URL
- https://github.com/SemanticPriming/word2manylanguages
- Programming language
- Python, R