Dataset Open Access

PrevDistro - Preverb Distributions in Hungarian

Kalivoda, Ágnes

Citation Style Language JSON Export

  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.6349410", 
  "language": "hun", 
  "title": "PrevDistro - Preverb Distributions in Hungarian", 
  "issued": {
    "date-parts": [
  "abstract": "<p>PrevDistro (Preverb Distributions) is an open-source dataset containing 41.5 million corpus occurrences of 49 preverb-verb construction types. It consists of the following columns:</p>\n\n<ul>\n\t<li>1 <em>sid</em>: ID</li>\n\t<li>2 <em>constype</em>: construction type</li>\n\t<li>3 <em>subtype</em>: construction subtype</li>\n\t<li>4 <em>prevpos</em>: preverb position</li>\n\t<li>5 <em>prev</em>: preverb</li>\n\t<li>6 <em>verb</em>: verb lemma</li>\n\t<li>7 <em>intervening</em>: intervening words (as lemmas)</li>\n\t<li>8 <em>actform</em>: actual form (the same content as in column 10, but this column is lowercase)</li>\n\t<li>9 <em>left</em>: left context</li>\n\t<li>10 <em>kwic</em>: keyword in context</li>\n\t<li>11 <em>right</em>: right context</li>\n\t<li>12 <em>docid</em>: document ID from the Hungarian Gigaword Corpus</li>\n\t<li>13 <em>title</em>: document title</li>\n\t<li>14 <em>style</em>: document style (e.g. official, press, ...)</li>\n\t<li>15 <em>region</em>: document region (e.g. Transylvania, Subcarpathia, ...)</li>\n\t<li>16 <em>year</em>: year of publication (sometimes several years can be found in one document)</li>\n</ul>\n\n<p>The first row stands for the header. If a cell&#39;s value is unspecified, it is marked with underscore (_).</p>", 
  "author": [
      "family": "Kalivoda, \u00c1gnes"
  "note": "PrevDistro 1.0.0 (deprecated) can be found at\nIn PrevDistro 2.0.0, several new columns were added and the already existing data has undergone some fixes as well.", 
  "version": "2.0.0", 
  "type": "dataset", 
  "id": "6349410"
All versions This version
Views 3939
Downloads 33
Data volume 39.7 GB39.7 GB
Unique views 2828
Unique downloads 33


Cite as