Dataset Open Access

The Knesset Meetings Corpus 2004-2005

Itai, Alon; Wintner, Shuly

Data curator(s)
Palachy, Shay

The Knesset Meetings Corpus 2004-2005 is made up of two components:

  • Raw texts - 282 files made up of 867,725 lines together. These can be downloaded in two formats:
    • As doc files, encoded using windows-1255 encoding:
    • As txt files, encoded using utf8 encoding:
      • kneset.tar.gz - An archive of all the raw text files, divided into two folders: [Github mirror]
        • 16 - Contains 164 text files made up of 543,228 lines together.
        • 17 - Contains 118 text files made up of 324,497 lines together.
      • knesset_txt_16.tar.gz- Contains 164 text files made up of 543,228 lines together. [MILA host] [Github Mirror]
      • knesset_txt_17.zip - Contains 118 text files made up of 324,497 lines together. [MILA host] [Github Mirror]
  • Tokenized and morphologically tagged texts - Tagged versions exist only for the files in the 16 folder. The text are represented using MILA's XML schema for corpora. These can be downloaded in two ways:
    • knesset_tagged_16.tar.gz - An archive of all tokenized and tagged files. [MILA host] [Archive.org mirror]
    • By cloning this repository, as the unarchived version of these files can be found in this repository, under the knesset_tagged folder.

The Open Natural Language Processing in Hebrew (NLPH) initiative is a joint effort by members of DataHack and The Public Knowledge Workshop to promote open tools and resources for Natural Language Processing in Hebrew. This community collects resources for NLP in Hebrew, as part of the NLPH project, which you can read more about here. These include corpora, lexicons, dictionaries, treebanks, embeddings, code, services, applications, papers, course materials and presentations, among others. A full list of these resources is maintained here: https://github.com/NLPH/NLPH_Resources If you have a resource you can contribute, to be released under some open license, please submit a pull request, or contact us at contact@nlph.org.il.
Files (575.2 MB)
Name Size
kneset16.zip
md5:07eb15134a4d6ea4bfbdfd560431058b
29.2 MB Download
kneset17.zip
md5:5fc7424978fe1e2848c89a29679c066b
17.9 MB Download
knesset_tagged_16.tar.gz
md5:895d7efb6384c4d913a03ce5c99c6a01
495.5 MB Download
knesset_txt_16.tar.gz
md5:9edb769b5e5a670717255f76d440b82e
20.9 MB Download
knesset_txt_17.zip
md5:49ba3cd3cbe8ce35ce5915eeb2f653e9
11.6 MB Download
117
64
views
downloads
All versions This version
Views 117117
Downloads 6464
Data volume 8.1 GB8.1 GB
Unique views 103103
Unique downloads 3232

Share

Cite as