10.5281/zenodo.6349410
https://zenodo.org/records/6349410
oai:zenodo.org:6349410
Kalivoda, Ágnes
Ágnes
Kalivoda
0000-0003-2520-5523
Hungarian Research Centre for Linguistics
PrevDistro - Preverb Distributions in Hungarian
Zenodo
2021
linguistics
Hungarian
preverb constructions
preverb
verbal prefix
verbal particle
construction
2021-06-21
hun
10.15774/PPKE.BTK.2021.019
10.5281/zenodo.6349409
2.0.0
GNU General Public License v3.0 or later
PrevDistro (Preverb Distributions) is an open-source dataset containing 41.5 million corpus occurrences of 49 preverb-verb construction types. It consists of the following columns:
1 sid: ID
2 constype: construction type
3 subtype: construction subtype
4 prevpos: preverb position
5 prev: preverb
6 verb: verb lemma
7 intervening: intervening words (as lemmas)
8 actform: actual form (the same content as in column 10, but this column is lowercase)
9 left: left context
10 kwic: keyword in context
11 right: right context
12 docid: document ID from the Hungarian Gigaword Corpus
13 title: document title
14 style: document style (e.g. official, press, ...)
15 region: document region (e.g. Transylvania, Subcarpathia, ...)
16 year: year of publication (sometimes several years can be found in one document)
The first row stands for the header. If a cell's value is unspecified, it is marked with underscore (_).
PrevDistro 1.0.0 (deprecated) can be found at https://science-data.hu/dataset.xhtml?persistentId=doi:10.5072/FK2/TRSD50
In PrevDistro 2.0.0, several new columns were added and the already existing data has undergone some fixes as well.