Dataset Open Access
Kristof, Victor; Grossglauser, Matthias; Thiran, Patrick
Update: A newer version of this dataset is available here. It comes with features extracted from the MEPs, the edits, and the dossiers, such as the nationality of MEPs, the type of law being edited, and the text of the edits. Check it out!
This upload contains the dataset presented and used in the paper:
Kristof, V., Grossglauser, M., Thiran, P., War of Words: The Competitive Dynamics of Legislative Processes, The Web Conference, April 20-24, 2020, Taipei, Taiwan
Read Section 2.2 of the paper to learn more about the European legislative process. The code to process and use the dataset can be found on GitHub.
The dataset is split into two legislature periods of the European Parliament, the 7th (war-of-words-ep7.txt) and the 8th (war-of-words-ep8.txt) legislature. Here is a snippet to load the dataset (for EP7 in this example) in Python:
import json
with open('path/to/war-of-words-ep7.txt') as f:
dataset = [json.loads(l) for l in f.readlines()]
In the two text files, each line is a data point representing a conflict between edits. It is encoded as a JSON list of dictionaries, where each dictionary is an edit. Each edit has the following structure:
{
'edit_id': 163187, // Unique edit identifier.
'accepted': True, // Label.
'dossier_ref': 'ENVI-AD(2012)487738', // Reference to dossier (see below).
'authors': [ // List of authors.
{
'id': 4550, // Unique MEP identifier (see below).
'name': 'Jill EVANS', // MEP name.
'rapporteur': False // Whether the MEP is rapporteur.
},
],
}
You can assume that:
The dossier_ref can be used to get more information on the dossier. It is formatted as COMM-TYPE(YEAR)PENUMBER (this follows the notation of file names used by the Parliament Secretariat), where
You can browse the Parliament documents to find details about the dossier for EP7 and EP8 (the PE number field should be enough).
The parliamentarians (MEPs, for Member of the European Parliament) have a unique identifier that you can use to get more details about them on the Parliament website: Go to https://www.europarl.europa.eu/meps/en/MEP_ID, where MEP_ID is the id of the MEP of interest.
This dataset is vowed to become richer: I will add more features, as I am able to extract them.
Don't hesitate to reach out to me if you have any questions!
To cite this work:
@inproceedings{kristof2020war,
author = {Kristof, Victor and Grossglauser, Matthias and Thiran, Patrick},
title = {War of Words: The Competitive Dynamics of Legislative Processes},
year = {2020},
booktitle = {Proceedings of The Web Conference 2020},
pages = {2803–2809},
numpages = {7},
location = {Taipei, Taiwan},
series = {WWW '20}
}
Name | Size | |
---|---|---|
war-of-words-ep7.txt
md5:6ed13142dbceab6795718501003fa4ff |
38.0 MB | Download |
war-of-words-ep8.txt
md5:840a5992d258cc586db517f97cb3445c |
59.9 MB | Download |
Views | 715 |
Downloads | 108 |
Data volume | 4.7 GB |
Unique views | 662 |
Unique downloads | 92 |