Published October 7, 2023 | Version 1.0.0
Dataset Open

Infinitive constructions in Hungarian

  • 1. HUN-REN Hungarian Research Centre for Linguistics

Description

This is an open-source dataset containing more than 9 million corpus occurrences of Hungarian infinitive constructions. It consists of the following columns:

  • 1 inf_form: Lowercase form of the infinitive.
  • 2 inf_lemma: Infinitive without inflectional suffixes. If the infinitive has a (separated) preverb, there is a + sign between the preverb and the verb stem.
  • 3 inf_prev: Lowercase form of the preverb associated with the infinitive.
  • 4 inf_prevpos: Position of the preverb relative to the infinitive, given in tokens.
  • 5 inf_stem: Verb stem of the infinitive.
  • 6 inf_persnum: The person and number marking on the infinitive.
  • 7 fin_form: Lowercase finite form (finite verb, plain adjective or noun, complex verb phrase) co-occuring with an infinitive. This heterogeneous set of items will be referred hereafter as FIN.
  • 8 fin_lemma: FIN without inflectional suffixes. If it is a verb having a (separated) preverb, there is a + sign between the preverb and the verb stem.
  • 9 fin_prev: Lowercase form of the preverb associated with the FIN.
  • 10 fin_prevpos: Position of the preverb relative to the FIN, given in tokens.
  • 11 fin_stem: Stem of the FIN.
  • 12 fin_wordclass: Word class of the FIN stem.
  • 13 fin_persnum: The person and number marking on the FIN.
  • 14 order: A schematic representation of how the infinitive, the FIN and their respective preverbs are ordered.
  • 15 argframe_cases: Arguments of the infinitive, represented by case-endings.
  • 16 argframe_long: Arguments of the infinitive, represented by lemma + case-ending combinations.
  • 17 doc_year: The year of writing or the year of publication, 0 if unknown.
  • 18 doc_style: Document style.
  • 19 doc_id: Document identifier.
  • 20 left_context: Text preceding the hit.
  • 21 kwic: The hit (the whole infinitive construction).
  • 22 right_context: Text following the hit.

The first row stands for the header. If a cell's value is unspecified, it is marked with underscore (_).

Files

Files (2.9 GB)

Name Size Download all
md5:78af72e6d0f86a5eb73fb703aba4eff8
2.9 GB Download