Dataset Open Access

War of Words: The Competitive Dynamics of Legislative Processes

Kristof, Victor; Grossglauser, Matthias; Thiran, Patrick

Contact person(s)
Kristof, Victor

This upload contains the dataset presented and used in the paper:

Kristof, V., Grossglauser, M., Thiran, P., War of Words: The Competitive Dynamics of Legislative Processes, The Web Conference, April 20-24, 2020, Taipei, Taiwan

Read Section 2.2 of the paper to learn more about the European legislative process. The code to process and use the dataset can be found on GitHub.

The dataset is split into two legislature periods of the European Parliament, the 7th (war-of-words-ep7.txt) and the 8th (war-of-words-ep8.txt) legislature. Here is a snippet to load the dataset (for EP7 in this example) in Python:

import json

with open('path/to/war-of-words-ep7.txt') as f:
    dataset = [json.loads(l) for l in f.readlines()]

In the two text files, each line is a data point representing a conflict between edits. It is encoded as a JSON list of dictionaries, where each dictionary is an edit. Each edit has the following structure:

{
  'edit_id': 163187,                     // Unique edit identifier.
  'accepted': True,                      // Label.
  'dossier_ref': 'ENVI-AD(2012)487738',  // Reference to dossier (see below).
  'authors': [                           // List of authors.
    {
      'id': 4550,                        // Unique MEP identifier (see below).
      'name': 'Jill EVANS',              // MEP name.
      'rapporteur': False                // Whether the MEP is rapporteur.
    },
  ],
}

You can assume that:

  • Each data point has at least one edit.
  • If there is only one edit, then it is in conflict with the status quo (see Section 4 of the paper).
  • If there are two or more edits in conflict, then they are all in conflict against each other and they are in conflict with the status quo (see Section 4 of the paper).
  • At most one edit is accepted in each data point.
  • In each legislature, each edit has a unique identifier.
  • There are no timestamps associated with edits (see Section 3 of the paper). 

The dossier_ref can be used to get more information on the dossier. It is formatted as COMM-TYPE(YEAR)PENUMBER (this follows the notation of file names used by the Parliament Secretariat), where

  • COMM is the committee identifier (4 capital letters)
  • TYPE is either AD (opinion) or A{7,8} (report for EP7 or EP8, see Section 2.2 of the paper)
  • YEAR is the year the dossier has been voted
  • PENUMBER is the "PE number", a document identifier used by the European Parliament

You can browse the Parliament documents to find details about the dossier for EP7 and EP8 (the PE number field should be enough).

The parliamentarians (MEPs, for Member of the European Parliament) have a unique identifier that you can use to get more details about them on the Parliament website: Go to https://www.europarl.europa.eu/meps/en/MEP_ID, where MEP_ID is the id of the MEP of interest.

This dataset is vowed to become richer: I will add more features, as I am able to extract them.

 

Don't hesitate to reach out to me if you have any questions!

 

To cite this work:

@inproceedings{kristof2020war,
  author = {Kristof, Victor and Grossglauser, Matthias and Thiran, Patrick},
  title = {War of Words: The Competitive Dynamics of Legislative Processes},
  year = {2020},
  booktitle = {Proceedings of The Web Conference 2020},
  pages = {2803–2809},
  numpages = {7},
  location = {Taipei, Taiwan},
  series = {WWW '20}
}

 

Files (97.9 MB)
Name Size
war-of-words-ep7.txt
md5:6ed13142dbceab6795718501003fa4ff
38.0 MB Download
war-of-words-ep8.txt
md5:840a5992d258cc586db517f97cb3445c
59.9 MB Download
363
36
views
downloads
Views 363
Downloads 36
Data volume 1.7 GB
Unique views 337
Unique downloads 27

Share

Cite as