Thai word2vec embeddings trained on OpenSubtitles
Description
This dataset contains the subs2vec embeddings for Thai, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles.
For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:
- Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
- Window size: varying context windows (e.g., 2, 5, 10, …)
- Each file corresponds to a unique configuration (dimension × window size).
Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).
If you use this dataset, please cite:
- Manuscript: https://doi.org/10.5281/zenodo.17243812
- Data: This Zenodo dataset (using the DOI provided here)
Files
Files
(22.5 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:693cbb4b5ed3a0939408366a1e1f92e5
|
164.1 MB | Download |
|
md5:2ffe7c44b8bbf0c91da2cf83b3c418d8
|
164.1 MB | Download |
|
md5:3f8b622d79caddf8094c97c410771bbc
|
164.2 MB | Download |
|
md5:4d27f594afa5f6eb753848f72283f5da
|
164.2 MB | Download |
|
md5:4f3c83f15cb3b63852b905d73b885019
|
164.2 MB | Download |
|
md5:fb39df235246460349666eb4e3bf6afd
|
164.3 MB | Download |
|
md5:8046839312744b946066214ad22e4aa3
|
164.2 MB | Download |
|
md5:1d50c55e08d1dc3ec93a4941ad4b5aed
|
164.0 MB | Download |
|
md5:2ad80e17697457c819a92c234276b09a
|
164.3 MB | Download |
|
md5:c462d91ee42cb2ca7ffffc1abea03185
|
164.3 MB | Download |
|
md5:9ccd6ec0dc839238068c6be14f7eeed4
|
164.2 MB | Download |
|
md5:d2f0da50d64aea7c1b416df3e304ac35
|
163.9 MB | Download |
|
md5:3fca23e02829ec74a620f41ea4fd45d0
|
325.9 MB | Download |
|
md5:a350ae0acc979c0aa5857f01cffeefcb
|
325.9 MB | Download |
|
md5:65e85db98e3a0acd9f19ae506945222a
|
325.9 MB | Download |
|
md5:2f050789ac22869a246f2aad41099a70
|
325.6 MB | Download |
|
md5:425d395a9da47f8a72879e166d656d5b
|
326.0 MB | Download |
|
md5:bf0953d1fdbd58a74b66c8f906ec87db
|
325.9 MB | Download |
|
md5:e1818c1e8a3deb9204e65c05ede1dc7a
|
326.2 MB | Download |
|
md5:d5c90552d01e59ce37827c1aa7187422
|
325.9 MB | Download |
|
md5:91ea8ea813d7f2de3347773eb9c1fbf2
|
326.2 MB | Download |
|
md5:601ff1648b0ceb3ab2c5924d3815f159
|
325.8 MB | Download |
|
md5:dd42ffc46372dcd40d2f6c542ec51f65
|
326.1 MB | Download |
|
md5:05e2c3ff65756640358c8e6413519582
|
326.1 MB | Download |
|
md5:d9d6af4b3b0d0081368c9ef13ec4156a
|
488.0 MB | Download |
|
md5:2009184cc37128abb6002251cae858c1
|
488.3 MB | Download |
|
md5:bbd66ef3aa499cc2f291111abed01c94
|
488.1 MB | Download |
|
md5:b58b836b42f22a97da2df3d02723558a
|
488.1 MB | Download |
|
md5:2e57087b08f2ab579fa1397a0712a4e7
|
487.9 MB | Download |
|
md5:4d51387ea7691cbbe5ede7a554eaa5bd
|
488.1 MB | Download |
|
md5:2d4aee9fbdf335aa2333abf4a965dbb1
|
488.1 MB | Download |
|
md5:a77365b4c32a50576552de65e0537942
|
488.3 MB | Download |
|
md5:22c3ed9534222551fb44983d562bb22d
|
488.0 MB | Download |
|
md5:b548c2e3eff0f787adedbd19e99e75c8
|
488.0 MB | Download |
|
md5:1686042b41b371104e6ed1cced3eb9ed
|
488.3 MB | Download |
|
md5:aec04b082b4c2ec04f08a287caa1b426
|
488.0 MB | Download |
|
md5:58ccaaebdbae0dfd618de54567768e4c
|
813.4 MB | Download |
|
md5:b88c497d9a033fc63f7e02bb6166786c
|
813.4 MB | Download |
|
md5:585f9ed903ba9c697dea086eaa32fcc8
|
813.3 MB | Download |
|
md5:bb090b78f6199ee19d2399f0c0ee1ffa
|
814.1 MB | Download |
|
md5:fd7b47f63708d1ad161a9877e9304f94
|
813.4 MB | Download |
|
md5:f71c670ccb162a6759179024156c2a6b
|
814.1 MB | Download |
|
md5:7b258d2539bd5eac11500f2b4bd88f70
|
813.4 MB | Download |
|
md5:261a3596da4ca77be52e86da98bcebc2
|
814.3 MB | Download |
|
md5:647121e11f6858a4b42d90368ee3157c
|
813.4 MB | Download |
|
md5:ff6caec9377f017c6ca07e4058cc1f15
|
814.5 MB | Download |
|
md5:8f15a8225b5ce02bde76655f9a710c9f
|
813.2 MB | Download |
|
md5:8bab6cafaa515d4909d70ee9c8c8bbe3
|
814.4 MB | Download |
|
md5:8cfe4f06be240591d0984cfe2d4a525d
|
83.3 MB | Download |
|
md5:fbd2ee5d43c084a3df2f240cf3a37d4b
|
83.1 MB | Download |
|
md5:3baa50a1fd099457043cec385e70e2fc
|
83.5 MB | Download |
|
md5:353c688b8bd548a08fb681b39dabdc52
|
83.2 MB | Download |
|
md5:c19687750db82292bc035122fc4becda
|
83.4 MB | Download |
|
md5:69d77a58b48359a3b40a0bd5eb30d86d
|
83.2 MB | Download |
|
md5:a6ee8b4be1e19dd76361485f2268f9c9
|
83.3 MB | Download |
|
md5:428823ca047b95e2b39dc074c3860bd9
|
83.3 MB | Download |
|
md5:a86964e36d0371bc49cce6cbec793903
|
83.3 MB | Download |
|
md5:344e2107271a1359e15f6f9135cdd41a
|
83.3 MB | Download |
|
md5:eedf41aeccd7354c0626a4ccaaf6410a
|
83.3 MB | Download |
|
md5:0e39d1acb53f31e1a3a9019fa6107a0d
|
83.2 MB | Download |
Additional details
Related works
- Is supplement to
- Standard: 10.5281/zenodo.17243812 (DOI)
Software
- Repository URL
- https://github.com/SemanticPriming/word2manylanguages
- Programming language
- Python , R