Dataset Open Access

The Knesset Meetings Corpus 2004-2005

Itai, Alon; Wintner, Shuly


JSON-LD (schema.org) Export

{
  "inLanguage": {
    "alternateName": "heb", 
    "@type": "Language", 
    "name": "Hebrew"
  }, 
  "description": "<p>The Knesset Meetings Corpus 2004-2005 is made up of two components:</p>\n\n<ul>\n\t<li>Raw texts - 282 files made up of 867,725 lines together. These can be downloaded in two formats:\n\t<ul>\n\t\t<li>As&nbsp;<code>doc</code>&nbsp;files, encoded using&nbsp;<code>windows-1255</code>&nbsp;encoding:\n\n\t\t<ul>\n\t\t\t<li><code>kneset16.zip</code>&nbsp;- Contains 164 text files made up of 543,228 lines together.&nbsp;<a href=\"http://yeda.cs.technion.ac.il:8088/corpus/software/corpora/knesset/txt/docs/kneset16.zip\">[MILA host]</a>&nbsp;<a href=\"https://github.com/NLPH/knesset-2004-2005/blob/master/kneset16.zip?raw=true\">[Github Mirror]</a></li>\n\t\t\t<li><code>kneset17.zip</code>&nbsp;- Contains 118 text files made up of 324,497 lines together.&nbsp;<a href=\"http://yeda.cs.technion.ac.il:8088/corpus/software/corpora/knesset/txt/docs/kneset17.zip\">[MILA host]</a>&nbsp;<a href=\"https://github.com/NLPH/knesset-2004-2005/blob/master/kneset17.zip?raw=true\">[Github Mirror]</a></li>\n\t\t</ul>\n\t\t</li>\n\t\t<li>As&nbsp;<code>txt</code>&nbsp;files, encoded using&nbsp;<code>utf8</code>&nbsp;encoding:\n\t\t<ul>\n\t\t\t<li><code>kneset.tar.gz</code>&nbsp;- An archive of all the raw text files, divided into two folders:&nbsp;<a href=\"https://github.com/NLPH/knesset-2004-2005/blob/master/kneset.tar.gz\">[Github mirror]</a>\n\t\t\t<ul>\n\t\t\t\t<li><code>16</code>&nbsp;- Contains 164 text files made up of 543,228 lines together.</li>\n\t\t\t\t<li><code>17</code>&nbsp;- Contains 118 text files made up of 324,497 lines together.</li>\n\t\t\t</ul>\n\t\t\t</li>\n\t\t\t<li><code>knesset_txt_16.tar.gz</code>- Contains 164 text files made up of 543,228 lines together.&nbsp;<a href=\"http://yeda.cs.technion.ac.il:8088/corpus/software/corpora/knesset/txt/utf8/knesset_txt_16.tar.gz\">[MILA host]</a>&nbsp;<a href=\"https://github.com/NLPH/knesset-2004-2005/blob/master/knesset_txt_16.tar.gz?raw=true\">[Github Mirror]</a></li>\n\t\t\t<li><code>knesset_txt_17.zip</code>&nbsp;- Contains 118 text files made up of 324,497 lines together.&nbsp;<a href=\"http://yeda.cs.technion.ac.il:8088/corpus/software/corpora/knesset/txt/utf8/knesset_txt_17.zip\">[MILA host]</a>&nbsp;<a href=\"https://github.com/NLPH/knesset-2004-2005/blob/master/knesset_txt_17.zip?raw=true\">[Github Mirror]</a></li>\n\t\t</ul>\n\t\t</li>\n\t</ul>\n\t</li>\n\t<li>Tokenized and morphologically tagged texts - Tagged versions exist only for the files in the&nbsp;<code>16</code>&nbsp;folder. The text are represented using&nbsp;<a href=\"http://www.mila.cs.technion.ac.il/eng/resources_standards.html\">MILA&#39;s XML schema for corpora</a>. These can be downloaded in two ways:\n\t<ul>\n\t\t<li><code>knesset_tagged_16.tar.gz</code>&nbsp;- An archive of all tokenized and tagged files.&nbsp;<a href=\"http://yeda.cs.technion.ac.il:8088/corpus/software/corpora/knesset/tagged/knesset_tagged_16.tar.gz\">[MILA host]</a>&nbsp;<a href=\"https://archive.org/details/knesset_transcripts_2004_2005\">[Archive.org mirror]</a></li>\n\t\t<li>By cloning this repository, as the unarchived version of these files can be found in this repository, under the&nbsp;<code>knesset_tagged</code>&nbsp;folder.</li>\n\t</ul>\n\t</li>\n</ul>", 
  "license": "https://opendatacommons.org/licenses/pddl/", 
  "creator": [
    {
      "affiliation": "Technion \u2013 Israel Institute of Technology", 
      "@type": "Person", 
      "name": "Itai, Alon"
    }, 
    {
      "affiliation": "University of Haifa", 
      "@type": "Person", 
      "name": "Wintner, Shuly"
    }
  ], 
  "url": "https://zenodo.org/record/2707356", 
  "datePublished": "2019-05-10", 
  "keywords": [
    "NLP", 
    "Hebrew", 
    "Knesset", 
    "Transcripts", 
    "Tokenization", 
    "morphologically tagged text", 
    "NLPH"
  ], 
  "version": "v1.0.0", 
  "contributor": [
    {
      "affiliation": "NLPH", 
      "@id": "https://orcid.org/0000-0002-1459-9320", 
      "@type": "Person", 
      "name": "Palachy, Shay"
    }
  ], 
  "@context": "https://schema.org/", 
  "distribution": [
    {
      "contentUrl": "https://zenodo.org/api/files/9881583a-f1fb-4f9c-a264-7d6c887cc405/kneset16.zip", 
      "encodingFormat": "zip", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/9881583a-f1fb-4f9c-a264-7d6c887cc405/kneset17.zip", 
      "encodingFormat": "zip", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/9881583a-f1fb-4f9c-a264-7d6c887cc405/knesset_tagged_16.tar.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/9881583a-f1fb-4f9c-a264-7d6c887cc405/knesset_txt_16.tar.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/9881583a-f1fb-4f9c-a264-7d6c887cc405/knesset_txt_17.zip", 
      "encodingFormat": "zip", 
      "@type": "DataDownload"
    }
  ], 
  "identifier": "https://doi.org/10.5281/zenodo.2707356", 
  "@id": "https://doi.org/10.5281/zenodo.2707356", 
  "@type": "Dataset", 
  "name": "The Knesset Meetings Corpus 2004-2005"
}
119
64
views
downloads
All versions This version
Views 119119
Downloads 6464
Data volume 8.1 GB8.1 GB
Unique views 105105
Unique downloads 3232

Share

Cite as