Dataset

PrevDistro - Preverb Distributions in Hungarian

Kalivoda, Ágnes

  "description": "<p>PrevDistro (Preverb Distributions) is an open-source dataset containing 41.5 million corpus occurrences of 49 preverb-verb construction types. It consists of the following columns:</p>\n\n<ul>\n\t<li>1 <em>sid</em>: ID</li>\n\t<li>2 <em>constype</em>: construction type</li>\n\t<li>3 <em>subtype</em>: construction subtype</li>\n\t<li>4 <em>prevpos</em>: preverb position</li>\n\t<li>5 <em>prev</em>: preverb</li>\n\t<li>6 <em>verb</em>: verb lemma</li>\n\t<li>7 <em>intervening</em>: intervening words (as lemmas)</li>\n\t<li>8 <em>actform</em>: actual form (the same content as in column 10, but this column is lowercase)</li>\n\t<li>9 <em>left</em>: left context</li>\n\t<li>10 <em>kwic</em>: keyword in context</li>\n\t<li>11 <em>right</em>: right context</li>\n\t<li>12 <em>docid</em>: document ID from the Hungarian Gigaword Corpus</li>\n\t<li>13 <em>title</em>: document title</li>\n\t<li>14 <em>style</em>: document style (e.g. official, press, ...)</li>\n\t<li>15 <em>region</em>: document region (e.g. Transylvania, Subcarpathia, ...)</li>\n\t<li>16 <em>year</em>: year of publication (sometimes several years can be found in one document)</li>\n</ul>\n\n<p>The first row stands for the header. If a cell&#39;s value is unspecified, it is marked with underscore (_).</p>", 
