Published August 6, 2020 | Version 1.0.0
Dataset Open

Word2Vec model - Czech wikipedia

  • 1. Faculty of Information Technology, Czech Technical University in Prague + Faculty of Mathematics and Physics, Charles University
  • 2. Faculty of Mathematics and Physics, Charles University

Description

Word2Vec embedding model trained on Czech wikipedia (from April 2020) corpus using gensim implementation with the following parameters in addition to default settings:

  • vector dimension = \(400\),
  • window size = \(10\),
  • word minimum count = \(10\),
  • sample = \(10^{-5}\).

Notes

This work was supported by the Czech Science Foundation (GAČR),grant number 19-01641S.

Files

cswiki-latest-pages-articles.word2vec.model.txt

Files (3.7 GB)

Name Size Download all
md5:8d35145d97833619a88527512bae325e
31.0 MB Download
md5:eef77a96da5195173a7d19cb0fa4abf9
2.2 GB Preview Download
md5:5277b04cf4d52654122c34cc906fe8b9
743.1 MB Download
md5:9a96884ff43de99419aebc8d8631ec40
743.1 MB Download