Published October 7, 2023
| Version 1.0.0
Dataset
Open
Infinitive constructions in Hungarian
Description
This is an open-source dataset containing more than 9 million corpus occurrences of Hungarian infinitive constructions. It consists of the following columns:
- 1 inf_form: Lowercase form of the infinitive.
- 2 inf_lemma: Infinitive without inflectional suffixes. If the infinitive has a (separated) preverb, there is a + sign between the preverb and the verb stem.
- 3 inf_prev: Lowercase form of the preverb associated with the infinitive.
- 4 inf_prevpos: Position of the preverb relative to the infinitive, given in tokens.
- 5 inf_stem: Verb stem of the infinitive.
- 6 inf_persnum: The person and number marking on the infinitive.
- 7 fin_form: Lowercase finite form (finite verb, plain adjective or noun, complex verb phrase) co-occuring with an infinitive. This heterogeneous set of items will be referred hereafter as FIN.
- 8 fin_lemma: FIN without inflectional suffixes. If it is a verb having a (separated) preverb, there is a + sign between the preverb and the verb stem.
- 9 fin_prev: Lowercase form of the preverb associated with the FIN.
- 10 fin_prevpos: Position of the preverb relative to the FIN, given in tokens.
- 11 fin_stem: Stem of the FIN.
- 12 fin_wordclass: Word class of the FIN stem.
- 13 fin_persnum: The person and number marking on the FIN.
- 14 order: A schematic representation of how the infinitive, the FIN and their respective preverbs are ordered.
- 15 argframe_cases: Arguments of the infinitive, represented by case-endings.
- 16 argframe_long: Arguments of the infinitive, represented by lemma + case-ending combinations.
- 17 doc_year: The year of writing or the year of publication, 0 if unknown.
- 18 doc_style: Document style.
- 19 doc_id: Document identifier.
- 20 left_context: Text preceding the hit.
- 21 kwic: The hit (the whole infinitive construction).
- 22 right_context: Text following the hit.
The first row stands for the header. If a cell's value is unspecified, it is marked with underscore (_).
Files
Files
(2.9 GB)
Name | Size | Download all |
---|---|---|
md5:78af72e6d0f86a5eb73fb703aba4eff8
|
2.9 GB | Download |