Published January 2, 2024 | Version v1
Dataset Open

Training data for the shared task Ideology and Power Identification in Parliamentary Debates

  • 1. ROR icon University of Tübingen
  • 2. Charles University, Faculty of Mathematics and Physics
  • 3. Kaunas University of Technology
  • 4. ROR icon Jožef Stefan Institute
  • 5. Znanstvenoraziskovalni center Slovenske akademije znanosti in umetnosti

Description

This dataset contains a selection of speeches from ParlaMint corpora (version 4.0) as the training set for  the shared task on "Ideology and Power Identification in Parliamentary Debates" in CLEF 2024.

All files are tab-separated text files with the following fields:

  • "id" is a unique (arbitrary) ID for each text.
  • "speaker" is a unique (arbitrary) ID for each speaker. There may be multiple speeches from the same speaker.
  • "sex" is the (binary/biological) sex of the speaker. This information is collected from varying sources (typically data published by the respective parliament), and in some cases it may be unspecified or unknown.
  • "text" is the transcribed text of the parliamentary speech. Real examples may include line breaks, and other special sequences escaped or quoted.
  • "text_en" is an automatic English translation of the corresponding text. This field may be empty (obviously)  for speeches in English, but the translations may be missing for a small number of non-English speeches as well.
  • "label" is the binary/numeric label. For political orientation, 0 is left and 1 is right. For power identification 0 indicates coalition (or governing party) and 1 indicates opposition.

File names indicate the task and the parliament. We provide data from the following national and regional parliaments.

  • Austria (at)
  • Bosnia and Herzegovina (ba)
  • Belgium (be)
  • Bulgaria (bg)
  • Czechia (cz)
  • Denmark (dk)
  • Estonia (ee) [only political orientation]
  • Spain (es)
  • Catalonia (es-ct)
  • Galicia (es-ga)
  • Basque Country (es-pv) [only power]
  • Finland (fi)
  • France (fr)
  • Great Britain (gb)
  • Greece (gr)
  • Croatia (hr)
  • Hungary (hu)
  • Iceland (is) [only political orientation]
  • Italy (it)
  • Latvia (lv)
  • The Netherlands (nl)
  • Norway (no) [only political orientation]
  • Poland (pl)
  • Portugal (pt)
  • Serbia (rs)
  • Sweden (se) [only political orientation]
  • Slovenia (si)
  • Turkey (tr)
  • Ukraine (ua)

The number of training instances and the class imbalance differs for each training set. We do not provide a fixed validation split. Please see the shared task website for further description of the data set and the sampling process.

Files

trainingset-ideology-power.zip

Files (813.9 MB)

Name Size Download all
md5:8c30d53c9201dce33c6b200660d7aea7
813.9 MB Preview Download

Additional details

Related works

Is derived from
Dataset: 11356/1859 (Handle)
References
Journal: 10.1007/s10579-021-09574-0 (DOI)