Published April 20, 2020 | Version 1.0
Dataset Open

War of Words: The Competitive Dynamics of Legislative Processes


Contact person:

  • 1. EPFL


Update: A newer version of this dataset is available here. It comes with features extracted from the MEPs, the edits, and the dossiers, such as the nationality of MEPs, the type of law being edited, and the text of the edits. Check it out!

This upload contains the dataset presented and used in the paper:

Kristof, V., Grossglauser, M., Thiran, P., War of Words: The Competitive Dynamics of Legislative Processes, The Web Conference, April 20-24, 2020, Taipei, Taiwan

Read Section 2.2 of the paper to learn more about the European legislative process. The code to process and use the dataset can be found on GitHub.

The dataset is split into two legislature periods of the European Parliament, the 7th (war-of-words-ep7.txt) and the 8th (war-of-words-ep8.txt) legislature. Here is a snippet to load the dataset (for EP7 in this example) in Python:

import json

with open('path/to/war-of-words-ep7.txt') as f:
    dataset = [json.loads(l) for l in f.readlines()]

In the two text files, each line is a data point representing a conflict between edits. It is encoded as a JSON list of dictionaries, where each dictionary is an edit. Each edit has the following structure:

  'edit_id': 163187,                     // Unique edit identifier.
  'accepted': True,                      // Label.
  'dossier_ref': 'ENVI-AD(2012)487738',  // Reference to dossier (see below).
  'authors': [                           // List of authors.
      'id': 4550,                        // Unique MEP identifier (see below).
      'name': 'Jill EVANS',              // MEP name.
      'rapporteur': False                // Whether the MEP is rapporteur.

You can assume that:

  • Each data point has at least one edit.
  • If there is only one edit, then it is in conflict with the status quo (see Section 4 of the paper).
  • If there are two or more edits in conflict, then they are all in conflict against each other and they are in conflict with the status quo (see Section 4 of the paper).
  • At most one edit is accepted in each data point.
  • In each legislature, each edit has a unique identifier.
  • There are no timestamps associated with edits (see Section 3 of the paper). 

The dossier_ref can be used to get more information on the dossier. It is formatted as COMM-TYPE(YEAR)PENUMBER (this follows the notation of file names used by the Parliament Secretariat), where

  • COMM is the committee identifier (4 capital letters)
  • TYPE is either AD (opinion) or A{7,8} (report for EP7 or EP8, see Section 2.2 of the paper)
  • YEAR is the year the dossier has been voted
  • PENUMBER is the "PE number", a document identifier used by the European Parliament

You can browse the Parliament documents to find details about the dossier for EP7 and EP8 (the PE number field should be enough).

The parliamentarians (MEPs, for Member of the European Parliament) have a unique identifier that you can use to get more details about them on the Parliament website: Go to, where MEP_ID is the id of the MEP of interest.

This dataset is vowed to become richer: I will add more features, as I am able to extract them.


Don't hesitate to reach out to me if you have any questions!


To cite this work:

  author = {Kristof, Victor and Grossglauser, Matthias and Thiran, Patrick},
  title = {War of Words: The Competitive Dynamics of Legislative Processes},
  year = {2020},
  booktitle = {Proceedings of The Web Conference 2020},
  pages = {2803–2809},
  numpages = {7},
  location = {Taipei, Taiwan},
  series = {WWW '20}




Files (97.9 MB)

Name Size Download all
38.0 MB Preview Download
59.9 MB Preview Download

Additional details

Related works

Is continued by
Dataset: 10.5281/zenodo.4525015 (DOI)
Is documented by
Conference paper: 10.1145/3366423.3380041 (DOI)