Published March 21, 2025 | Version v2
Dataset Open

Training data for the shared task Ideology and Power Identification in Parliamentary Debates (2025)

  • 1. University of Tübingen
  • 2. Charles University, Faculty of Mathematics and Physics
  • 3. ROR icon Kaunas University of Technology
  • 4. ROR icon Jožef Stefan Institute
  • 5. ROR icon University of Ljubljana
  • 6. ROR icon Institute of Contemporary History
  • 7. ROR icon Research Centre of the Slovenian Academy of Sciences and Arts

Description

This dataset contains a selection of speeches from ParlaMint corpora (version 4.1) as the training set for  the shared task on "Ideology and Power Identification in Parliamentary Debates" in CLEF 2025.

All files are tab-separated text files with the following fields:

  • "id" is a unique (arbitrary) ID for each text.
  • "speaker" is a unique (arbitrary) ID for each speaker. There may be multiple speeches from the same speaker.
  • "sex" is the (binary/biological) sex of the speaker. This information is collected from varying sources (typically data published by the respective parliament), and in some cases it may be unspecified or unknown.
  • "text" is the transcribed text of the parliamentary speech. Real examples may include line breaks, and other special sequences escaped or quoted.
  • "text_en" is an automatic English translation of the corresponding text. This field may be empty (obviously)  for speeches in English, but the translations may be missing for a small number of non-English speeches as well.
  • "orientation" is the binary/numeric label ( 0 is left and 1 is right). Orientation labels are based on Wikipedia.
  • "power" is the binary label for power role (0 is opposition, 1 is coalition), this information is based on the information provided by the ParlaMint contributors. This value is not always present, either due to parliamentary systems with no defined coalition/opposition, or unknown orientation information for some speakers (e.g., PMs with no party affilitiation). Missing values are indicated as 'NA'.
  • "populism" is a populism index based on multiple expert surveys (to increase the coverage). We focus on a particular dimension of populism in this task: the position of the party of the speaker in populist - pluralist spectrum. This is measured on a 4-point ordinal scale (1: Strongly Pluralist, 2: Moderately Pluralist 3: Moderately Populist, 4: Strongly Populist). Not all values are present in all parliaments. Many parties/speakers are not covered by the data, and some values are missing due to failure to match the survey identifies/names and ParlaMint identifiers. Missing values are indicated as 'NA'.

Small samples of the data files are provided in the shared task GitHub repository at https://github.com/coltekin/ideology-power-st-baseline.

File names include a code for the parliament. We provide data from the following national and regional parliaments.

  • Austria (at)
  • Bosnia and Herzegovina (ba)
  • Belgium (be)
  • Bulgaria (bg)
  • Czechia (cz)
  • Denmark (dk)
  • Estonia (ee)
  • Spain (es)
  • Catalonia (es-ct)
  • Galicia (es-ga)
  • Basque Country (es-pv)
  • Finland (fi)
  • France (fr)
  • Great Britain (gb)
  • Greece (gr)
  • Croatia (hr)
  • Hungary (hu)
  • Iceland (is)
  • Italy (it)
  • Latvia (lv)
  • The Netherlands (nl)
  • Norway (no)
  • Poland (pl)
  • Portugal (pt)
  • Serbia (rs)
  • Sweden (se)
  • Slovenia (si)
  • Turkey (tr)
  • Ukraine (ua)

The number of training instances and the class imbalance differs for each training set. We do not provide a fixed validation split. Please see the shared task website and the GitHub repository for further description of the data set and the sampling process.

Files

Files (1.1 GB)

Name Size Download all
md5:aac7a1c4ac732fec181de07f3509eb08
59.2 MB Download
md5:80e0e19a6479c038c41955881e83566e
20.5 MB Download
md5:8c380351df744ede2fbe8c90c711b4cd
16.2 MB Download
md5:3f0969468888c82bb9a1f54a3875ec65
41.3 MB Download
md5:23c1fdef404725c3b597cd92489dc1b8
20.9 MB Download
md5:554cba269335a0f254fa92d4b845d3e9
10.2 MB Download
md5:c45e1aa415e0fac51f2c109556d5db3d
8.0 MB Download
md5:b0e9c0f26f33603dedf3cb769132bc18
54.5 MB Download
md5:5df782cfd63f3c43ffa0946e2877431a
30.4 MB Download
md5:1ea5ab809c55fdfb33993aad5e5fd85d
4.8 MB Download
md5:67e81c86c731d1a6a039be75d02edee6
50.2 MB Download
md5:37361ba5b6e727a49b036fd20452b782
12.2 MB Download
md5:9f2b695a71cae93d0ba519429a19ce51
30.2 MB Download
md5:065841ba85c149bb056b204661d587b4
54.2 MB Download
md5:8fd62968e492dbfb6b9fc6556aff3dc3
68.6 MB Download
md5:1792451d2ebdb996c2897f27d7a12602
58.8 MB Download
md5:85983041cfa5a42b1105864af6eed129
25.5 MB Download
md5:616b311fbca0eee693d6da9c657c4805
19.2 MB Download
md5:2167bf36609780606a8bcc60ffedf145
46.8 MB Download
md5:fb574b1e3128e45d07b78bf66ea281d6
15.5 MB Download
md5:1c737c433efef8b0e71e5d46bb3d441d
35.9 MB Download
md5:cacd8149555c90d303e3740aa61f3400
53.9 MB Download
md5:a189eb32f176408213d19bb7429404a7
38.7 MB Download
md5:a052ab70701731f76b1142e3bc92ff48
35.0 MB Download
md5:11e81136ab1da03355abe5558336f46e
80.7 MB Download
md5:9181e2176e4a307f5dfcb7267a5937d7
32.0 MB Download
md5:df9b895fe2b25d1e66b066a7fa50793a
36.0 MB Download
md5:43b87691233275c7f979ba247633f7c6
70.7 MB Download
md5:53474342893dcfec633ef91d7e2a78a3
48.8 MB Download

Additional details

Related works

Is derived from
Dataset: 11356/1912 (Handle)
References
Journal article: 10.1007/s10579-021-09574-0 (DOI)
Journal article: 10.1007/s10579-024-09798-w (DOI)