There is a newer version of this record available.

Dataset Open Access

War of Words II: Enriched Models of Law-Making Processes

Kristof, Victor; Suresh, Aswin; Grossglauser, Matthias; Thiran, Patrick


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.4525015", 
  "title": "War of Words II: Enriched Models of Law-Making Processes", 
  "issued": {
    "date-parts": [
      [
        2021, 
        4, 
        19
      ]
    ]
  }, 
  "abstract": "<pre><code>@inproceedings{kristof2021war,\n  author = {Kristof, Victor and Suresh, Aswin and Grossglauser, Matthias and Thiran, Patrick},\n  title = {War of Words II: Enriched Models for Law-Making Processes},\n  year = {2021},\n  booktitle = {Proceedings of The Web Conference 2021},\n  TODO: pages = {2803\u20132809},\n  numpages = {12},\n  location = {Ljubljana, Solvenia},\n  series = {WWW '21}\n}</code></pre>\n\n<p>This upload contains the dataset presented and used in the paper:</p>\n\n<blockquote>\n<p>Victor Kristof, Aswin Suresh, Matthias Grossglauser, Patrick Thiran, <a href=\"https://infoscience.epfl.ch/record/284828\">War of Words II: Enriched Models of Law-Making Processes</a>,&nbsp;The Web Conference 2021, April 19-23, 2021, Ljubljana, Slovenia</p>\n</blockquote>\n\n<p>The code to process and use the dataset can be found on&nbsp;<a href=\"https://github.com/indy-lab/war-of-words\">GitHub</a>.</p>\n\n<p>This is a follow-up work to&nbsp;<em><a href=\"https://zenodo.org/record/3757714\">War of Words: The Competitive Dynamics of Legislative Processes</a>.</em></p>\n\n<p>The dataset is split into two legislature periods of the European Parliament, the 7th (<strong>war-of-words-2-ep7.txt</strong>) and the 8th (<strong>war-of-words-2-ep8.txt</strong>) legislature.&nbsp;Here is a snippet to load the dataset (for EP8&nbsp;in this example) in Python:</p>\n\n<pre><code class=\"language-python\">import json\n\nwith open('path/to/war-of-words-2-ep8.txt') as f:\n    dataset = [json.loads(l) for l in f.readlines()]\n</code></pre>\n\n<p>In the two text files, each line is a data point representing a&nbsp;<em>conflict between edits</em>. It is encoded as a JSON list of dictionaries, where each dictionary is an edit.&nbsp;Each edit has the following structure:</p>\n\n<pre><code class=\"language-json\">{\n  'edit_id': 163187,                     // Unique edit identifier\n  'edit_type': 'insert',                 // One of 'insert', 'delete', or 'replace'\n  'accepted': True,                      // Label\n  'dossier_ref': 'ENVI-AD(2012)487738',  // Reference to dossier (see below)\n  'dossier_type': 'opinion',             // One of 'opinion' or 'report'\n  'date': '2017-03-02',                  // Date of vote of all amendments for this dossier\n  'legal_act': 'regulation',             // One of 'regulation', 'directive', or 'decision'\n  'committee': 'BUDG',                   // Committee in which this edit was proposed\n  'outsider': False,                     // Whether the above committee is the reporting committee\n  'article_type': 'recital',             // One of 7 article types\n  'source': 'BUDG-AM(2017)599742',       // Reference to original document of the amendment\n  'justification': None,                 // The text of the optional justification (or None)\n  'edit_indices': {...},                 // Indices of edit in the amendment (see below)\n  'text_original': [...],                // Original text reported in the source document (see below)\n  'text_amended': [...],                 // Amended text reported in the source document (see below)\n  'authors': [                           // List of authors\n    {\n      'id': 88882,                       // Unique MEP identifier (see below)\n      'name': 'Victor NEGRESCU',         // MEP full name\n      'gender': 'M',                     // Gender as reported on the Parliament database\n      'nationality': 'Romania',          // One of 28 nationalities\n      'group': 'Group of the Progressive Alliance of Socialists and Democrats in the European Parliament',                             // One 9 political groups\n      'rapporteur': False                // Whether the MEP is rapporteur for this dossier\n    },\n  ],\n}</code></pre>\n\n<p>The <strong>text_original</strong>&nbsp;and <strong>text_amended</strong>&nbsp;keys contain the portion of text reported in the `source` document. The text is tokenized as a list of terms (words, numbers, punctuation, ...).&nbsp;<em>These two keys are not the actual edit.</em>&nbsp;This is because the amendments are reported as an edited paragraph (which also gives some context to the edit).&nbsp;An amendment contains one or more edits. To access the actual text of the edit, use the&nbsp;`edit_indices` key, which&nbsp;is a dictionary (such as `{&#39;i1&#39;: 80, &#39;i2&#39;: 80, &#39;j1&#39;: 80, &#39;j2&#39;: 101}`). The `i1` and `i2` keys are the first and last indices of the change in the original text, and the `j1` and `j2` keys are the first and last indices of the amended text. Hence, you can access the text of the edit by doing:</p>\n\n<pre><code class=\"language-python\">idx = edit['edit_indices']\ni1, i2, j1, j2 = idx['i1'], idx['i2'], idx['j1'], idx['j2']\nold = edit['text_original'][i1:i2]\nnew = edit['text_amended'][j1:j2]\n\nprint(f'\"{old}\" is replaced by \"{new}\"')</code></pre>\n\n<p>If an edit is an insertion, then `i1 == i2`. If it is a deletion, then `j1 == j2`. Read the documentation of <a href=\"https://docs.python.org/3.6/library/difflib.html#difflib.SequenceMatcher.get_opcodes\">difflib</a> to learn more about how these indices&nbsp;are obtained.</p>\n\n<p>You can assume that:</p>\n\n<ul>\n\t<li>Each data point has at least one edit.</li>\n\t<li>If there is only one edit, then it is&nbsp;<em>in conflict with the status quo&nbsp;</em>(see Section 2&nbsp;of the paper).</li>\n\t<li>If there are two or more edits in conflict, then they are all in conflict against each other&nbsp;<em>and</em>&nbsp;they are in conflict with the status quo (see Section 2&nbsp;of the paper).</li>\n\t<li>At most one edit is accepted in&nbsp;each data point.</li>\n\t<li>In each legislature, each edit has a unique identifier.</li>\n</ul>\n\n<p>You&nbsp;can find the original documents where the amendments were proposed using the `source` key, which has the format `COMM-AM(YEAR)PENUMBER`.&nbsp;Use the following search tools for&nbsp;<a href=\"https://www.europarl.europa.eu/committees/en/archives/7/document-search\">EP7</a>&nbsp;and&nbsp;<a href=\"https://www.europarl.europa.eu/committees/en/archives/8/document-search\">EP8</a>&nbsp;(the PE number field should be enough, adding a &quot;.&quot; to fit the required format).</p>\n\n<p>The&nbsp;parliamentarians (MEPs, for Member of the European Parliament) have a unique identifier that you can use to get more details about them on the Parliament website: Go to&nbsp;<strong>https://www.europarl.europa.eu/meps/en/MEP_ID</strong>, where&nbsp;<strong>MEP_ID&nbsp;</strong>is the id of the MEP of interest.</p>\n\n<p><strong>Don&#39;t hesitate to&nbsp;<a href=\"mailto:victor.kristof@epfl.ch?subject=Question%20about%20the%20War%20of%20Words%20dataset\">reach out to me</a>&nbsp;if you have any questions!</strong></p>\n\n<p>&nbsp;</p>\n\n<p>To cite this work:</p>\n\n<pre><code>@inproceedings{kristof2021war,\n  author = {Kristof, Victor and Suresh, Aswin and Grossglauser, Matthias and Thiran, Patrick},\n  title = {War of Words II: Enriched Models for Law-Making Processes},\n  year = {2021},\n  booktitle = {Proceedings of The Web Conference 2021},\n  TODO: pages = {2803\u20132809},\n  numpages = {12},\n  location = {Ljubljana, Solvenia},\n  series = {WWW '21}\n}</code></pre>\n\n<p>&nbsp;</p>", 
  "author": [
    {
      "family": "Kristof, Victor"
    }, 
    {
      "family": "Suresh, Aswin"
    }, 
    {
      "family": "Grossglauser, Matthias"
    }, 
    {
      "family": "Thiran, Patrick"
    }
  ], 
  "version": "1.0", 
  "type": "dataset", 
  "id": "4525015"
}
147
59
views
downloads
All versions This version
Views 14765
Downloads 5921
Data volume 18.2 GB7.0 GB
Unique views 11256
Unique downloads 3215

Share

Cite as