{
  "DOI": "10.5281/zenodo.4525015",
  "abstract": "@inproceedings{kristof2021war,\n  author = {Kristof, Victor and Suresh, Aswin and Grossglauser, Matthias and Thiran, Patrick},\n  title = {War of Words II: Enriched Models for Law-Making Processes},\n  year = {2021},\n  booktitle = {Proceedings of The Web Conference 2021},\n  TODO: pages = {2803\u20132809},\n  numpages = {12},\n  location = {Ljubljana, Solvenia},\n  series = {WWW '21}\n}\n\n\nThis upload contains the dataset presented and used in the paper:\n\n\n\n\nVictor Kristof, Aswin Suresh, Matthias Grossglauser, Patrick Thiran, War of Words II: Enriched Models of Law-Making Processes,\u00a0The Web Conference 2021, April 19-23, 2021, Ljubljana, Slovenia\n\n\n\nThe code to process and use the dataset can be found on\u00a0GitHub.\n\n\nThis is a follow-up work to\u00a0War of Words: The Competitive Dynamics of Legislative Processes.\n\n\nThe dataset is split into two legislature periods of the European Parliament, the 7th (war-of-words-2-ep7.txt) and the 8th (war-of-words-2-ep8.txt) legislature.\u00a0Here is a snippet to load the dataset (for EP8\u00a0in this example) in Python:\n\n\nimport json\n\nwith open('path/to/war-of-words-2-ep8.txt') as f:\n    dataset = [json.loads(l) for l in f.readlines()]\n\n\n\nIn the two text files, each line is a data point representing a\u00a0conflict between edits. It is encoded as a JSON list of dictionaries, where each dictionary is an edit.\u00a0Each edit has the following structure:\n\n\n{\n  'edit_id': 163187,                     // Unique edit identifier\n  'edit_type': 'insert',                 // One of 'insert', 'delete', or 'replace'\n  'accepted': True,                      // Label\n  'dossier_ref': 'ENVI-AD(2012)487738',  // Reference to dossier (see below)\n  'dossier_type': 'opinion',             // One of 'opinion' or 'report'\n  'date': '2017-03-02',                  // Date of vote of all amendments for this dossier\n  'legal_act': 'regulation',             // One of 'regulation', 'directive', or 'decision'\n  'committee': 'BUDG',                   // Committee in which this edit was proposed\n  'outsider': False,                     // Whether the above committee is the reporting committee\n  'article_type': 'recital',             // One of 7 article types\n  'source': 'BUDG-AM(2017)599742',       // Reference to original document of the amendment\n  'justification': None,                 // The text of the optional justification (or None)\n  'edit_indices': {...},                 // Indices of edit in the amendment (see below)\n  'text_original': [...],                // Original text reported in the source document (see below)\n  'text_amended': [...],                 // Amended text reported in the source document (see below)\n  'authors': [                           // List of authors\n    {\n      'id': 88882,                       // Unique MEP identifier (see below)\n      'name': 'Victor NEGRESCU',         // MEP full name\n      'gender': 'M',                     // Gender as reported on the Parliament database\n      'nationality': 'Romania',          // One of 28 nationalities\n      'group': 'Group of the Progressive Alliance of Socialists and Democrats in the European Parliament',                             // One 9 political groups\n      'rapporteur': False                // Whether the MEP is rapporteur for this dossier\n    },\n  ],\n}\n\n\nThe text_original\u00a0and text_amended\u00a0keys contain the portion of text reported in the `source` document. The text is tokenized as a list of terms (words, numbers, punctuation, ...).\u00a0These two keys are not the actual edit.\u00a0This is because the amendments are reported as an edited paragraph (which also gives some context to the edit).\u00a0An amendment contains one or more edits. To access the actual text of the edit, use the\u00a0`edit_indices` key, which\u00a0is a dictionary (such as `{'i1': 80, 'i2': 80, 'j1': 80, 'j2': 101}`). The `i1` and `i2` keys are the first and last indices of the change in the original text, and the `j1` and `j2` keys are the first and last indices of the amended text. Hence, you can access the text of the edit by doing:\n\n\nidx = edit['edit_indices']\ni1, i2, j1, j2 = idx['i1'], idx['i2'], idx['j1'], idx['j2']\nold = edit['text_original'][i1:i2]\nnew = edit['text_amended'][j1:j2]\n\nprint(f'\"{old}\" is replaced by \"{new}\"')\n\n\nIf an edit is an insertion, then `i1 == i2`. If it is a deletion, then `j1 == j2`. Read the documentation of difflib to learn more about how these indices\u00a0are obtained.\n\n\nYou can assume that:\n\n\n\n\t\nEach data point has at least one edit.\n\t\nIf there is only one edit, then it is\u00a0in conflict with the status quo\u00a0(see Section 2\u00a0of the paper).\n\t\nIf there are two or more edits in conflict, then they are all in conflict against each other\u00a0and\u00a0they are in conflict with the status quo (see Section 2\u00a0of the paper).\n\t\nAt most one edit is accepted in\u00a0each data point.\n\t\nIn each legislature, each edit has a unique identifier.\n\n\n\nYou\u00a0can find the original documents where the amendments were proposed using the `source` key, which has the format `COMM-AM(YEAR)PENUMBER`.\u00a0Use the following search tools for\u00a0EP7\u00a0and\u00a0EP8\u00a0(the PE number field should be enough, adding a \".\" to fit the required format).\n\n\nThe\u00a0parliamentarians (MEPs, for Member of the European Parliament) have a unique identifier that you can use to get more details about them on the Parliament website: Go to\u00a0https://www.europarl.europa.eu/meps/en/MEP_ID, where\u00a0MEP_ID\u00a0is the id of the MEP of interest.\n\n\nDon't hesitate to\u00a0reach out to me\u00a0if you have any questions!\n\n\n\u00a0\n\n\nTo cite this work:\n\n\n@inproceedings{kristof2021war,\n  author = {Kristof, Victor and Suresh, Aswin and Grossglauser, Matthias and Thiran, Patrick},\n  title = {War of Words II: Enriched Models for Law-Making Processes},\n  year = {2021},\n  booktitle = {Proceedings of The Web Conference 2021},\n  TODO: pages = {2803\u20132809},\n  numpages = {12},\n  location = {Ljubljana, Solvenia},\n  series = {WWW '21}\n}\n\n\n\u00a0",
  "author": [
    {
      "family": "Kristof",
      "given": "Victor"
    },
    {
      "family": "Suresh",
      "given": "Aswin"
    },
    {
      "family": "Grossglauser",
      "given": "Matthias"
    },
    {
      "family": "Thiran",
      "given": "Patrick"
    }
  ],
  "id": "4525015",
  "issued": {
    "date-parts": [
      [
        "2021",
        "04",
        "19"
      ]
    ]
  },
  "publisher": "Zenodo",
  "title": "War of Words II: Enriched Models of Law-Making Processes",
  "type": "dataset",
  "version": "1.0"
}