Published November 16, 2025 | Version v1.0.0
Dataset Open

Thai word2vec embeddings trained on OpenSubtitles

  • 1. ROR icon Harrisburg University of Science and Technology

Description

This dataset contains the subs2vec embeddings for Thai, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles

For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:

  • Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
  • Window size: varying context windows (e.g., 2, 5, 10, …)
  • Each file corresponds to a unique configuration (dimension × window size). 

Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1). 

If you use this dataset, please cite:

Files

Files (22.5 GB)

Name Size Download all
md5:693cbb4b5ed3a0939408366a1e1f92e5
164.1 MB Download
md5:2ffe7c44b8bbf0c91da2cf83b3c418d8
164.1 MB Download
md5:3f8b622d79caddf8094c97c410771bbc
164.2 MB Download
md5:4d27f594afa5f6eb753848f72283f5da
164.2 MB Download
md5:4f3c83f15cb3b63852b905d73b885019
164.2 MB Download
md5:fb39df235246460349666eb4e3bf6afd
164.3 MB Download
md5:8046839312744b946066214ad22e4aa3
164.2 MB Download
md5:1d50c55e08d1dc3ec93a4941ad4b5aed
164.0 MB Download
md5:2ad80e17697457c819a92c234276b09a
164.3 MB Download
md5:c462d91ee42cb2ca7ffffc1abea03185
164.3 MB Download
md5:9ccd6ec0dc839238068c6be14f7eeed4
164.2 MB Download
md5:d2f0da50d64aea7c1b416df3e304ac35
163.9 MB Download
md5:3fca23e02829ec74a620f41ea4fd45d0
325.9 MB Download
md5:a350ae0acc979c0aa5857f01cffeefcb
325.9 MB Download
md5:65e85db98e3a0acd9f19ae506945222a
325.9 MB Download
md5:2f050789ac22869a246f2aad41099a70
325.6 MB Download
md5:425d395a9da47f8a72879e166d656d5b
326.0 MB Download
md5:bf0953d1fdbd58a74b66c8f906ec87db
325.9 MB Download
md5:e1818c1e8a3deb9204e65c05ede1dc7a
326.2 MB Download
md5:d5c90552d01e59ce37827c1aa7187422
325.9 MB Download
md5:91ea8ea813d7f2de3347773eb9c1fbf2
326.2 MB Download
md5:601ff1648b0ceb3ab2c5924d3815f159
325.8 MB Download
md5:dd42ffc46372dcd40d2f6c542ec51f65
326.1 MB Download
md5:05e2c3ff65756640358c8e6413519582
326.1 MB Download
md5:d9d6af4b3b0d0081368c9ef13ec4156a
488.0 MB Download
md5:2009184cc37128abb6002251cae858c1
488.3 MB Download
md5:bbd66ef3aa499cc2f291111abed01c94
488.1 MB Download
md5:b58b836b42f22a97da2df3d02723558a
488.1 MB Download
md5:2e57087b08f2ab579fa1397a0712a4e7
487.9 MB Download
md5:4d51387ea7691cbbe5ede7a554eaa5bd
488.1 MB Download
md5:2d4aee9fbdf335aa2333abf4a965dbb1
488.1 MB Download
md5:a77365b4c32a50576552de65e0537942
488.3 MB Download
md5:22c3ed9534222551fb44983d562bb22d
488.0 MB Download
md5:b548c2e3eff0f787adedbd19e99e75c8
488.0 MB Download
md5:1686042b41b371104e6ed1cced3eb9ed
488.3 MB Download
md5:aec04b082b4c2ec04f08a287caa1b426
488.0 MB Download
md5:58ccaaebdbae0dfd618de54567768e4c
813.4 MB Download
md5:b88c497d9a033fc63f7e02bb6166786c
813.4 MB Download
md5:585f9ed903ba9c697dea086eaa32fcc8
813.3 MB Download
md5:bb090b78f6199ee19d2399f0c0ee1ffa
814.1 MB Download
md5:fd7b47f63708d1ad161a9877e9304f94
813.4 MB Download
md5:f71c670ccb162a6759179024156c2a6b
814.1 MB Download
md5:7b258d2539bd5eac11500f2b4bd88f70
813.4 MB Download
md5:261a3596da4ca77be52e86da98bcebc2
814.3 MB Download
md5:647121e11f6858a4b42d90368ee3157c
813.4 MB Download
md5:ff6caec9377f017c6ca07e4058cc1f15
814.5 MB Download
md5:8f15a8225b5ce02bde76655f9a710c9f
813.2 MB Download
md5:8bab6cafaa515d4909d70ee9c8c8bbe3
814.4 MB Download
md5:8cfe4f06be240591d0984cfe2d4a525d
83.3 MB Download
md5:fbd2ee5d43c084a3df2f240cf3a37d4b
83.1 MB Download
md5:3baa50a1fd099457043cec385e70e2fc
83.5 MB Download
md5:353c688b8bd548a08fb681b39dabdc52
83.2 MB Download
md5:c19687750db82292bc035122fc4becda
83.4 MB Download
md5:69d77a58b48359a3b40a0bd5eb30d86d
83.2 MB Download
md5:a6ee8b4be1e19dd76361485f2268f9c9
83.3 MB Download
md5:428823ca047b95e2b39dc074c3860bd9
83.3 MB Download
md5:a86964e36d0371bc49cce6cbec793903
83.3 MB Download
md5:344e2107271a1359e15f6f9135cdd41a
83.3 MB Download
md5:eedf41aeccd7354c0626a4ccaaf6410a
83.3 MB Download
md5:0e39d1acb53f31e1a3a9019fa6107a0d
83.2 MB Download

Additional details

Related works

Is supplement to
Standard: 10.5281/zenodo.17243812 (DOI)

Software