Published June 21, 2021
| Version 2.0.0
Dataset
Open
PrevDistro - Preverb Distributions in Hungarian
Description
PrevDistro (Preverb Distributions) is an open-source dataset containing 41.5 million corpus occurrences of 49 preverb-verb construction types. It consists of the following columns:
- 1 sid: ID
- 2 constype: construction type
- 3 subtype: construction subtype
- 4 prevpos: preverb position
- 5 prev: preverb
- 6 verb: verb lemma
- 7 intervening: intervening words (as lemmas)
- 8 actform: actual form (the same content as in column 10, but this column is lowercase)
- 9 left: left context
- 10 kwic: keyword in context
- 11 right: right context
- 12 docid: document ID from the Hungarian Gigaword Corpus
- 13 title: document title
- 14 style: document style (e.g. official, press, ...)
- 15 region: document region (e.g. Transylvania, Subcarpathia, ...)
- 16 year: year of publication (sometimes several years can be found in one document)
The first row stands for the header. If a cell's value is unspecified, it is marked with underscore (_).
Notes
Files
Files
(13.2 GB)
Name | Size | Download all |
---|---|---|
md5:686521c26f1fbbc473e210946a4ab0cb
|
13.2 GB | Download |
Additional details
Related works
- Is new version of
- Thesis: 10.15774/PPKE.BTK.2021.019 (DOI)