Dataset Open Access

The Knesset Meetings Corpus 2004-2005

Itai, Alon; Wintner, Shuly


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/9881583a-f1fb-4f9c-a264-7d6c887cc405/kneset16.zip"
      }, 
      "checksum": "md5:07eb15134a4d6ea4bfbdfd560431058b", 
      "bucket": "9881583a-f1fb-4f9c-a264-7d6c887cc405", 
      "key": "kneset16.zip", 
      "type": "zip", 
      "size": 29178143
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/9881583a-f1fb-4f9c-a264-7d6c887cc405/kneset17.zip"
      }, 
      "checksum": "md5:5fc7424978fe1e2848c89a29679c066b", 
      "bucket": "9881583a-f1fb-4f9c-a264-7d6c887cc405", 
      "key": "kneset17.zip", 
      "type": "zip", 
      "size": 17938683
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/9881583a-f1fb-4f9c-a264-7d6c887cc405/knesset_tagged_16.tar.gz"
      }, 
      "checksum": "md5:895d7efb6384c4d913a03ce5c99c6a01", 
      "bucket": "9881583a-f1fb-4f9c-a264-7d6c887cc405", 
      "key": "knesset_tagged_16.tar.gz", 
      "type": "gz", 
      "size": 495545698
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/9881583a-f1fb-4f9c-a264-7d6c887cc405/knesset_txt_16.tar.gz"
      }, 
      "checksum": "md5:9edb769b5e5a670717255f76d440b82e", 
      "bucket": "9881583a-f1fb-4f9c-a264-7d6c887cc405", 
      "key": "knesset_txt_16.tar.gz", 
      "type": "gz", 
      "size": 20875926
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/9881583a-f1fb-4f9c-a264-7d6c887cc405/knesset_txt_17.zip"
      }, 
      "checksum": "md5:49ba3cd3cbe8ce35ce5915eeb2f653e9", 
      "bucket": "9881583a-f1fb-4f9c-a264-7d6c887cc405", 
      "key": "knesset_txt_17.zip", 
      "type": "zip", 
      "size": 11618230
    }
  ], 
  "owners": [
    66968
  ], 
  "doi": "10.5281/zenodo.2707356", 
  "stats": {
    "version_unique_downloads": 32.0, 
    "unique_views": 105.0, 
    "views": 119.0, 
    "version_views": 119.0, 
    "unique_downloads": 32.0, 
    "version_unique_views": 105.0, 
    "volume": 8073872072.0, 
    "version_downloads": 64.0, 
    "downloads": 64.0, 
    "version_volume": 8073872072.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.2707356", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.2707355", 
    "bucket": "https://zenodo.org/api/files/9881583a-f1fb-4f9c-a264-7d6c887cc405", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.2707355.svg", 
    "html": "https://zenodo.org/record/2707356", 
    "latest_html": "https://zenodo.org/record/2707356", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.2707356.svg", 
    "latest": "https://zenodo.org/api/records/2707356"
  }, 
  "conceptdoi": "10.5281/zenodo.2707355", 
  "created": "2019-05-10T12:53:41.007705+00:00", 
  "updated": "2020-01-24T19:25:03.435255+00:00", 
  "conceptrecid": "2707355", 
  "revision": 4, 
  "id": 2707356, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.2707356", 
    "description": "<p>The Knesset Meetings Corpus 2004-2005 is made up of two components:</p>\n\n<ul>\n\t<li>Raw texts - 282 files made up of 867,725 lines together. These can be downloaded in two formats:\n\t<ul>\n\t\t<li>As&nbsp;<code>doc</code>&nbsp;files, encoded using&nbsp;<code>windows-1255</code>&nbsp;encoding:\n\n\t\t<ul>\n\t\t\t<li><code>kneset16.zip</code>&nbsp;- Contains 164 text files made up of 543,228 lines together.&nbsp;<a href=\"http://yeda.cs.technion.ac.il:8088/corpus/software/corpora/knesset/txt/docs/kneset16.zip\">[MILA host]</a>&nbsp;<a href=\"https://github.com/NLPH/knesset-2004-2005/blob/master/kneset16.zip?raw=true\">[Github Mirror]</a></li>\n\t\t\t<li><code>kneset17.zip</code>&nbsp;- Contains 118 text files made up of 324,497 lines together.&nbsp;<a href=\"http://yeda.cs.technion.ac.il:8088/corpus/software/corpora/knesset/txt/docs/kneset17.zip\">[MILA host]</a>&nbsp;<a href=\"https://github.com/NLPH/knesset-2004-2005/blob/master/kneset17.zip?raw=true\">[Github Mirror]</a></li>\n\t\t</ul>\n\t\t</li>\n\t\t<li>As&nbsp;<code>txt</code>&nbsp;files, encoded using&nbsp;<code>utf8</code>&nbsp;encoding:\n\t\t<ul>\n\t\t\t<li><code>kneset.tar.gz</code>&nbsp;- An archive of all the raw text files, divided into two folders:&nbsp;<a href=\"https://github.com/NLPH/knesset-2004-2005/blob/master/kneset.tar.gz\">[Github mirror]</a>\n\t\t\t<ul>\n\t\t\t\t<li><code>16</code>&nbsp;- Contains 164 text files made up of 543,228 lines together.</li>\n\t\t\t\t<li><code>17</code>&nbsp;- Contains 118 text files made up of 324,497 lines together.</li>\n\t\t\t</ul>\n\t\t\t</li>\n\t\t\t<li><code>knesset_txt_16.tar.gz</code>- Contains 164 text files made up of 543,228 lines together.&nbsp;<a href=\"http://yeda.cs.technion.ac.il:8088/corpus/software/corpora/knesset/txt/utf8/knesset_txt_16.tar.gz\">[MILA host]</a>&nbsp;<a href=\"https://github.com/NLPH/knesset-2004-2005/blob/master/knesset_txt_16.tar.gz?raw=true\">[Github Mirror]</a></li>\n\t\t\t<li><code>knesset_txt_17.zip</code>&nbsp;- Contains 118 text files made up of 324,497 lines together.&nbsp;<a href=\"http://yeda.cs.technion.ac.il:8088/corpus/software/corpora/knesset/txt/utf8/knesset_txt_17.zip\">[MILA host]</a>&nbsp;<a href=\"https://github.com/NLPH/knesset-2004-2005/blob/master/knesset_txt_17.zip?raw=true\">[Github Mirror]</a></li>\n\t\t</ul>\n\t\t</li>\n\t</ul>\n\t</li>\n\t<li>Tokenized and morphologically tagged texts - Tagged versions exist only for the files in the&nbsp;<code>16</code>&nbsp;folder. The text are represented using&nbsp;<a href=\"http://www.mila.cs.technion.ac.il/eng/resources_standards.html\">MILA&#39;s XML schema for corpora</a>. These can be downloaded in two ways:\n\t<ul>\n\t\t<li><code>knesset_tagged_16.tar.gz</code>&nbsp;- An archive of all tokenized and tagged files.&nbsp;<a href=\"http://yeda.cs.technion.ac.il:8088/corpus/software/corpora/knesset/tagged/knesset_tagged_16.tar.gz\">[MILA host]</a>&nbsp;<a href=\"https://archive.org/details/knesset_transcripts_2004_2005\">[Archive.org mirror]</a></li>\n\t\t<li>By cloning this repository, as the unarchived version of these files can be found in this repository, under the&nbsp;<code>knesset_tagged</code>&nbsp;folder.</li>\n\t</ul>\n\t</li>\n</ul>", 
    "contributors": [
      {
        "orcid": "0000-0002-1459-9320", 
        "affiliation": "NLPH", 
        "type": "DataCurator", 
        "name": "Palachy, Shay"
      }
    ], 
    "title": "The Knesset Meetings Corpus 2004-2005", 
    "language": "heb", 
    "notes": "The Open Natural Language Processing in Hebrew (NLPH) initiative is a joint effort by members of\u00a0DataHack\u00a0and\u00a0The Public Knowledge Workshop\u00a0to promote\u00a0open tools and resources for Natural Language Processing in Hebrew.\n\nThis community collects resources for NLP in Hebrew, as part of the\u00a0NLPH project, which you can\u00a0read more about here. These include corpora, lexicons, dictionaries, treebanks, embeddings, code, services, applications, papers, course materials and presentations, among others.\n\nA full list of these resources is maintained here:\u00a0https://github.com/NLPH/NLPH_Resources\n\nIf you have a resource you can contribute, to be released under some open license, please submit a pull request, or contact us at\u00a0contact@nlph.org.il.", 
    "relations": {
      "version": [
        {
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "2707355"
          }, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "2707356"
          }
        }
      ]
    }, 
    "access_right": "open", 
    "communities": [
      {
        "id": "nlph"
      }
    ], 
    "version": "v1.0.0", 
    "keywords": [
      "NLP", 
      "Hebrew", 
      "Knesset", 
      "Transcripts", 
      "Tokenization", 
      "morphologically tagged text", 
      "NLPH"
    ], 
    "publication_date": "2019-05-10", 
    "creators": [
      {
        "affiliation": "Technion \u2013 Israel Institute of Technology", 
        "name": "Itai, Alon"
      }, 
      {
        "affiliation": "University of Haifa", 
        "name": "Wintner, Shuly"
      }
    ], 
    "license": {
      "id": "PDDL-1.0"
    }, 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    }, 
    "related_identifiers": [
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.2707355", 
        "relation": "isVersionOf"
      }
    ]
  }
}
119
64
views
downloads
All versions This version
Views 119119
Downloads 6464
Data volume 8.1 GB8.1 GB
Unique views 105105
Unique downloads 3232

Share

Cite as