Published August 6, 2020 | Version 1.0.0
Dataset Open

Word2Vec model - Czech legislation

  • 1. Faculty of Information Technology, Czech Technical University in Prague + Faculty of Mathematics and Physics, Charles University
  • 2. Faculty of Mathematics and Physics, Charles University

Description

Word2Vec embedding model trained on Czech legislation (from April 2020) corpus using gensim implementation with the following parameters in addition to default settings:

  • vector dimension = \(400\),
  • window size = \(10\),
  • word minimum count = \(10\),
  • sample = \(10^{-5}\).

Notes

This work was supported by the Czech Science Foundation (GAČR),grant number 19-01641S.

Files

law.word2vec.model.txt

Files (804.1 MB)

Name Size Download all
md5:e40dc6c2fb765f8df70531e9df3539db
6.6 MB Download
md5:f417e84c31f8d70946136080d5f5f82c
480.9 MB Preview Download
md5:20afb85b38a14a2100f426a688f569b7
158.3 MB Download
md5:4a0c0967180020d6b8838fb8684d9dc5
158.3 MB Download