There is a newer version of this record available.

Dataset Open Access

War of Words II: Enriched Models of Law-Making Processes

Kristof, Victor; Suresh, Aswin; Grossglauser, Matthias; Thiran, Patrick


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/7b3856a6-7d0f-45c1-83b2-15e49cecba44/helpers.zip"
      }, 
      "checksum": "md5:152fdcf7c29a141e8b2febc3f6db437a", 
      "bucket": "7b3856a6-7d0f-45c1-83b2-15e49cecba44", 
      "key": "helpers.zip", 
      "type": "zip", 
      "size": 92272
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7b3856a6-7d0f-45c1-83b2-15e49cecba44/split-indices.zip"
      }, 
      "checksum": "md5:d7838210ab293c77d654d3c866e9e782", 
      "bucket": "7b3856a6-7d0f-45c1-83b2-15e49cecba44", 
      "key": "split-indices.zip", 
      "type": "zip", 
      "size": 753438
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7b3856a6-7d0f-45c1-83b2-15e49cecba44/text-embeddings.zip"
      }, 
      "checksum": "md5:cf18ad0a99227af05f04ba6f3da2718c", 
      "bucket": "7b3856a6-7d0f-45c1-83b2-15e49cecba44", 
      "key": "text-embeddings.zip", 
      "type": "zip", 
      "size": 43354347
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7b3856a6-7d0f-45c1-83b2-15e49cecba44/war-of-words-2-ep7.txt"
      }, 
      "checksum": "md5:2bcb1eb32600f292d6004720ab5971e7", 
      "bucket": "7b3856a6-7d0f-45c1-83b2-15e49cecba44", 
      "key": "war-of-words-2-ep7.txt", 
      "type": "txt", 
      "size": 448958857
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7b3856a6-7d0f-45c1-83b2-15e49cecba44/war-of-words-2-ep8.txt"
      }, 
      "checksum": "md5:b607a18d0197fb76a985b3b96bf37fff", 
      "bucket": "7b3856a6-7d0f-45c1-83b2-15e49cecba44", 
      "key": "war-of-words-2-ep8.txt", 
      "type": "txt", 
      "size": 607401773
    }
  ], 
  "owners": [
    89424
  ], 
  "doi": "10.5281/zenodo.4525015", 
  "stats": {
    "version_unique_downloads": 31.0, 
    "unique_views": 54.0, 
    "views": 62.0, 
    "version_views": 138.0, 
    "unique_downloads": 15.0, 
    "version_unique_views": 108.0, 
    "volume": 6962231445.0, 
    "version_downloads": 58.0, 
    "downloads": 21.0, 
    "version_volume": 17574963009.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.4525015", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.4525014", 
    "bucket": "https://zenodo.org/api/files/7b3856a6-7d0f-45c1-83b2-15e49cecba44", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.4525014.svg", 
    "html": "https://zenodo.org/record/4525015", 
    "latest_html": "https://zenodo.org/record/4709248", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.4525015.svg", 
    "latest": "https://zenodo.org/api/records/4709248"
  }, 
  "conceptdoi": "10.5281/zenodo.4525014", 
  "created": "2021-04-20T14:14:47.715895+00:00", 
  "updated": "2021-04-26T15:32:35.514196+00:00", 
  "conceptrecid": "4525014", 
  "revision": 4, 
  "id": 4525015, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.4525015", 
    "description": "<pre><code>@inproceedings{kristof2021war,\n  author = {Kristof, Victor and Suresh, Aswin and Grossglauser, Matthias and Thiran, Patrick},\n  title = {War of Words II: Enriched Models for Law-Making Processes},\n  year = {2021},\n  booktitle = {Proceedings of The Web Conference 2021},\n  TODO: pages = {2803\u20132809},\n  numpages = {12},\n  location = {Ljubljana, Solvenia},\n  series = {WWW '21}\n}</code></pre>\n\n<p>This upload contains the dataset presented and used in the paper:</p>\n\n<blockquote>\n<p>Victor Kristof, Aswin Suresh, Matthias Grossglauser, Patrick Thiran, <a href=\"https://infoscience.epfl.ch/record/284828\">War of Words II: Enriched Models of Law-Making Processes</a>,&nbsp;The Web Conference 2021, April 19-23, 2021, Ljubljana, Slovenia</p>\n</blockquote>\n\n<p>The code to process and use the dataset can be found on&nbsp;<a href=\"https://github.com/indy-lab/war-of-words\">GitHub</a>.</p>\n\n<p>This is a follow-up work to&nbsp;<em><a href=\"https://zenodo.org/record/3757714\">War of Words: The Competitive Dynamics of Legislative Processes</a>.</em></p>\n\n<p>The dataset is split into two legislature periods of the European Parliament, the 7th (<strong>war-of-words-2-ep7.txt</strong>) and the 8th (<strong>war-of-words-2-ep8.txt</strong>) legislature.&nbsp;Here is a snippet to load the dataset (for EP8&nbsp;in this example) in Python:</p>\n\n<pre><code class=\"language-python\">import json\n\nwith open('path/to/war-of-words-2-ep8.txt') as f:\n    dataset = [json.loads(l) for l in f.readlines()]\n</code></pre>\n\n<p>In the two text files, each line is a data point representing a&nbsp;<em>conflict between edits</em>. It is encoded as a JSON list of dictionaries, where each dictionary is an edit.&nbsp;Each edit has the following structure:</p>\n\n<pre><code class=\"language-json\">{\n  'edit_id': 163187,                     // Unique edit identifier\n  'edit_type': 'insert',                 // One of 'insert', 'delete', or 'replace'\n  'accepted': True,                      // Label\n  'dossier_ref': 'ENVI-AD(2012)487738',  // Reference to dossier (see below)\n  'dossier_type': 'opinion',             // One of 'opinion' or 'report'\n  'date': '2017-03-02',                  // Date of vote of all amendments for this dossier\n  'legal_act': 'regulation',             // One of 'regulation', 'directive', or 'decision'\n  'committee': 'BUDG',                   // Committee in which this edit was proposed\n  'outsider': False,                     // Whether the above committee is the reporting committee\n  'article_type': 'recital',             // One of 7 article types\n  'source': 'BUDG-AM(2017)599742',       // Reference to original document of the amendment\n  'justification': None,                 // The text of the optional justification (or None)\n  'edit_indices': {...},                 // Indices of edit in the amendment (see below)\n  'text_original': [...],                // Original text reported in the source document (see below)\n  'text_amended': [...],                 // Amended text reported in the source document (see below)\n  'authors': [                           // List of authors\n    {\n      'id': 88882,                       // Unique MEP identifier (see below)\n      'name': 'Victor NEGRESCU',         // MEP full name\n      'gender': 'M',                     // Gender as reported on the Parliament database\n      'nationality': 'Romania',          // One of 28 nationalities\n      'group': 'Group of the Progressive Alliance of Socialists and Democrats in the European Parliament',                             // One 9 political groups\n      'rapporteur': False                // Whether the MEP is rapporteur for this dossier\n    },\n  ],\n}</code></pre>\n\n<p>The <strong>text_original</strong>&nbsp;and <strong>text_amended</strong>&nbsp;keys contain the portion of text reported in the `source` document. The text is tokenized as a list of terms (words, numbers, punctuation, ...).&nbsp;<em>These two keys are not the actual edit.</em>&nbsp;This is because the amendments are reported as an edited paragraph (which also gives some context to the edit).&nbsp;An amendment contains one or more edits. To access the actual text of the edit, use the&nbsp;`edit_indices` key, which&nbsp;is a dictionary (such as `{&#39;i1&#39;: 80, &#39;i2&#39;: 80, &#39;j1&#39;: 80, &#39;j2&#39;: 101}`). The `i1` and `i2` keys are the first and last indices of the change in the original text, and the `j1` and `j2` keys are the first and last indices of the amended text. Hence, you can access the text of the edit by doing:</p>\n\n<pre><code class=\"language-python\">idx = edit['edit_indices']\ni1, i2, j1, j2 = idx['i1'], idx['i2'], idx['j1'], idx['j2']\nold = edit['text_original'][i1:i2]\nnew = edit['text_amended'][j1:j2]\n\nprint(f'\"{old}\" is replaced by \"{new}\"')</code></pre>\n\n<p>If an edit is an insertion, then `i1 == i2`. If it is a deletion, then `j1 == j2`. Read the documentation of <a href=\"https://docs.python.org/3.6/library/difflib.html#difflib.SequenceMatcher.get_opcodes\">difflib</a> to learn more about how these indices&nbsp;are obtained.</p>\n\n<p>You can assume that:</p>\n\n<ul>\n\t<li>Each data point has at least one edit.</li>\n\t<li>If there is only one edit, then it is&nbsp;<em>in conflict with the status quo&nbsp;</em>(see Section 2&nbsp;of the paper).</li>\n\t<li>If there are two or more edits in conflict, then they are all in conflict against each other&nbsp;<em>and</em>&nbsp;they are in conflict with the status quo (see Section 2&nbsp;of the paper).</li>\n\t<li>At most one edit is accepted in&nbsp;each data point.</li>\n\t<li>In each legislature, each edit has a unique identifier.</li>\n</ul>\n\n<p>You&nbsp;can find the original documents where the amendments were proposed using the `source` key, which has the format `COMM-AM(YEAR)PENUMBER`.&nbsp;Use the following search tools for&nbsp;<a href=\"https://www.europarl.europa.eu/committees/en/archives/7/document-search\">EP7</a>&nbsp;and&nbsp;<a href=\"https://www.europarl.europa.eu/committees/en/archives/8/document-search\">EP8</a>&nbsp;(the PE number field should be enough, adding a &quot;.&quot; to fit the required format).</p>\n\n<p>The&nbsp;parliamentarians (MEPs, for Member of the European Parliament) have a unique identifier that you can use to get more details about them on the Parliament website: Go to&nbsp;<strong>https://www.europarl.europa.eu/meps/en/MEP_ID</strong>, where&nbsp;<strong>MEP_ID&nbsp;</strong>is the id of the MEP of interest.</p>\n\n<p><strong>Don&#39;t hesitate to&nbsp;<a href=\"mailto:victor.kristof@epfl.ch?subject=Question%20about%20the%20War%20of%20Words%20dataset\">reach out to me</a>&nbsp;if you have any questions!</strong></p>\n\n<p>&nbsp;</p>\n\n<p>To cite this work:</p>\n\n<pre><code>@inproceedings{kristof2021war,\n  author = {Kristof, Victor and Suresh, Aswin and Grossglauser, Matthias and Thiran, Patrick},\n  title = {War of Words II: Enriched Models for Law-Making Processes},\n  year = {2021},\n  booktitle = {Proceedings of The Web Conference 2021},\n  TODO: pages = {2803\u20132809},\n  numpages = {12},\n  location = {Ljubljana, Solvenia},\n  series = {WWW '21}\n}</code></pre>\n\n<p>&nbsp;</p>", 
    "contributors": [
      {
        "affiliation": "EPFL", 
        "type": "ContactPerson", 
        "name": "Kristof, Victor"
      }, 
      {
        "affiliation": "EPFL", 
        "type": "ContactPerson", 
        "name": "Suresh, Aswin"
      }
    ], 
    "title": "War of Words II: Enriched Models of Law-Making Processes", 
    "license": {
      "id": "CC-BY-4.0"
    }, 
    "relations": {
      "version": [
        {
          "count": 2, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "4525014"
          }, 
          "is_last": false, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "4709248"
          }
        }
      ]
    }, 
    "version": "1.0", 
    "communities": [
      {
        "id": "epfl"
      }
    ], 
    "publication_date": "2021-04-19", 
    "creators": [
      {
        "affiliation": "EPFL", 
        "name": "Kristof, Victor"
      }, 
      {
        "affiliation": "EPFL", 
        "name": "Suresh, Aswin"
      }, 
      {
        "affiliation": "EPFL", 
        "name": "Grossglauser, Matthias"
      }, 
      {
        "affiliation": "EPFL", 
        "name": "Thiran, Patrick"
      }
    ], 
    "access_right": "open", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    }, 
    "related_identifiers": [
      {
        "scheme": "doi", 
        "identifier": "10.1145/3442381.3450131", 
        "relation": "isDocumentedBy", 
        "resource_type": "publication-conferencepaper"
      }, 
      {
        "scheme": "doi", 
        "identifier": "10.1145/3366423.3380041", 
        "relation": "continues", 
        "resource_type": "dataset"
      }, 
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.4525014", 
        "relation": "isVersionOf"
      }
    ]
  }
}
138
58
views
downloads
All versions This version
Views 13862
Downloads 5821
Data volume 17.6 GB7.0 GB
Unique views 10854
Unique downloads 3115

Share

Cite as