Published May 10, 2019
| Version v1.0.0
Dataset
Open
The Knesset Meetings Corpus 2004-2005
Creators
- 1. Technion – Israel Institute of Technology
- 2. University of Haifa
Description
The Knesset Meetings Corpus 2004-2005 is made up of two components:
- Raw texts - 282 files made up of 867,725 lines together. These can be downloaded in two formats:
- As
doc
files, encoded usingwindows-1255
encoding:kneset16.zip
- Contains 164 text files made up of 543,228 lines together. [MILA host] [Github Mirror]kneset17.zip
- Contains 118 text files made up of 324,497 lines together. [MILA host] [Github Mirror]
- As
txt
files, encoded usingutf8
encoding:kneset.tar.gz
- An archive of all the raw text files, divided into two folders: [Github mirror]16
- Contains 164 text files made up of 543,228 lines together.17
- Contains 118 text files made up of 324,497 lines together.
knesset_txt_16.tar.gz
- Contains 164 text files made up of 543,228 lines together. [MILA host] [Github Mirror]knesset_txt_17.zip
- Contains 118 text files made up of 324,497 lines together. [MILA host] [Github Mirror]
- As
- Tokenized and morphologically tagged texts - Tagged versions exist only for the files in the
16
folder. The text are represented using MILA's XML schema for corpora. These can be downloaded in two ways:knesset_tagged_16.tar.gz
- An archive of all tokenized and tagged files. [MILA host] [Archive.org mirror]- By cloning this repository, as the unarchived version of these files can be found in this repository, under the
knesset_tagged
folder.
Notes
Files
kneset16.zip
Files
(575.2 MB)
Name | Size | Download all |
---|---|---|
md5:07eb15134a4d6ea4bfbdfd560431058b
|
29.2 MB | Preview Download |
md5:5fc7424978fe1e2848c89a29679c066b
|
17.9 MB | Preview Download |
md5:895d7efb6384c4d913a03ce5c99c6a01
|
495.5 MB | Download |
md5:9edb769b5e5a670717255f76d440b82e
|
20.9 MB | Download |
md5:49ba3cd3cbe8ce35ce5915eeb2f653e9
|
11.6 MB | Preview Download |