00000nmm##2200000uu#4500 4525015 doi 10.5281/zenodo.4525015 oai:zenodo.org:4525015 user-epfl Kristof, Victor EPFL Suresh, Aswin EPFL Suresh, Aswin EPFL Grossglauser, Matthias EPFL Thiran, Patrick EPFL War of Words II: Enriched Models of Law-Making Processes Kristof, Victor EPFL doi:10.1145/3442381.3450131 doi:10.1145/3366423.3380041 info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 spdx <pre><code>@inproceedings{kristof2021war, author = {Kristof, Victor and Suresh, Aswin and Grossglauser, Matthias and Thiran, Patrick}, title = {War of Words II: Enriched Models for Law-Making Processes}, year = {2021}, booktitle = {Proceedings of The Web Conference 2021}, TODO: pages = {2803–2809}, numpages = {12}, location = {Ljubljana, Solvenia}, series = {WWW '21} }</code></pre> This upload contains the dataset presented and used in the paper: <blockquote> Victor Kristof, Aswin Suresh, Matthias Grossglauser, Patrick Thiran, <a href="https://infoscience.epfl.ch/record/284828">War of Words II: Enriched Models of Law-Making Processes</a>, The Web Conference 2021, April 19-23, 2021, Ljubljana, Slovenia </blockquote> The code to process and use the dataset can be found on <a href="https://github.com/indy-lab/war-of-words">GitHub</a>. This is a follow-up work to <a href="https://zenodo.org/record/3757714">War of Words: The Competitive Dynamics of Legislative Processes</a>. The dataset is split into two legislature periods of the European Parliament, the 7th (war-of-words-2-ep7.txt) and the 8th (war-of-words-2-ep8.txt) legislature. Here is a snippet to load the dataset (for EP8 in this example) in Python: <pre><code class="language-python">import json with open('path/to/war-of-words-2-ep8.txt') as f: dataset = [json.loads(l) for l in f.readlines()] </code></pre> In the two text files, each line is a data point representing a conflict between edits. It is encoded as a JSON list of dictionaries, where each dictionary is an edit. Each edit has the following structure: <pre><code class="language-json">{ 'edit_id': 163187, // Unique edit identifier 'edit_type': 'insert', // One of 'insert', 'delete', or 'replace' 'accepted': True, // Label 'dossier_ref': 'ENVI-AD(2012)487738', // Reference to dossier (see below) 'dossier_type': 'opinion', // One of 'opinion' or 'report' 'date': '2017-03-02', // Date of vote of all amendments for this dossier 'legal_act': 'regulation', // One of 'regulation', 'directive', or 'decision' 'committee': 'BUDG', // Committee in which this edit was proposed 'outsider': False, // Whether the above committee is the reporting committee 'article_type': 'recital', // One of 7 article types 'source': 'BUDG-AM(2017)599742', // Reference to original document of the amendment 'justification': None, // The text of the optional justification (or None) 'edit_indices': {...}, // Indices of edit in the amendment (see below) 'text_original': [...], // Original text reported in the source document (see below) 'text_amended': [...], // Amended text reported in the source document (see below) 'authors': [ // List of authors { 'id': 88882, // Unique MEP identifier (see below) 'name': 'Victor NEGRESCU', // MEP full name 'gender': 'M', // Gender as reported on the Parliament database 'nationality': 'Romania', // One of 28 nationalities 'group': 'Group of the Progressive Alliance of Socialists and Democrats in the European Parliament', // One 9 political groups 'rapporteur': False // Whether the MEP is rapporteur for this dossier }, ], }</code></pre> The text_original and text_amended keys contain the portion of text reported in the `source` document. The text is tokenized as a list of terms (words, numbers, punctuation, ...). These two keys are not the actual edit. This is because the amendments are reported as an edited paragraph (which also gives some context to the edit). An amendment contains one or more edits. To access the actual text of the edit, use the `edit_indices` key, which is a dictionary (such as `{'i1': 80, 'i2': 80, 'j1': 80, 'j2': 101}`). The `i1` and `i2` keys are the first and last indices of the change in the original text, and the `j1` and `j2` keys are the first and last indices of the amended text. Hence, you can access the text of the edit by doing: <pre><code class="language-python">idx = edit['edit_indices'] i1, i2, j1, j2 = idx['i1'], idx['i2'], idx['j1'], idx['j2'] old = edit['text_original'][i1:i2] new = edit['text_amended'][j1:j2] print(f'"{old}" is replaced by "{new}"')</code></pre> If an edit is an insertion, then `i1 == i2`. If it is a deletion, then `j1 == j2`. Read the documentation of <a href="https://docs.python.org/3.6/library/difflib.html#difflib.SequenceMatcher.get_opcodes">difflib</a> to learn more about how these indices are obtained. You can assume that: <ul> <li>Each data point has at least one edit.</li> <li>If there is only one edit, then it is in conflict with the status quo (see Section 2 of the paper).</li> <li>If there are two or more edits in conflict, then they are all in conflict against each other and they are in conflict with the status quo (see Section 2 of the paper).</li> <li>At most one edit is accepted in each data point.</li> <li>In each legislature, each edit has a unique identifier.</li> </ul> You can find the original documents where the amendments were proposed using the `source` key, which has the format `COMM-AM(YEAR)PENUMBER`. Use the following search tools for <a href="https://www.europarl.europa.eu/committees/en/archives/7/document-search">EP7</a> and <a href="https://www.europarl.europa.eu/committees/en/archives/8/document-search">EP8</a> (the PE number field should be enough, adding a "." to fit the required format). The parliamentarians (MEPs, for Member of the European Parliament) have a unique identifier that you can use to get more details about them on the Parliament website: Go to https://www.europarl.europa.eu/meps/en/MEP_ID, where MEP_ID is the id of the MEP of interest. Don't hesitate to <a href="mailto:victor.kristof@epfl.ch?subject=Question%20about%20the%20War%20of%20Words%20dataset">reach out to me</a> if you have any questions!   To cite this work: <pre><code>@inproceedings{kristof2021war, author = {Kristof, Victor and Suresh, Aswin and Grossglauser, Matthias and Thiran, Patrick}, title = {War of Words II: Enriched Models for Law-Making Processes}, year = {2021}, booktitle = {Proceedings of The Web Conference 2021}, TODO: pages = {2803–2809}, numpages = {12}, location = {Ljubljana, Solvenia}, series = {WWW '21} }</code></pre>   Zenodo 2021-04-19 user-epfl info:eu-repo/semantics/other 20210426153235.0 607401773 md5:b607a18d0197fb76a985b3b96bf37fff https://zenodo.org/records/4525015/files/war-of-words-2-ep8.txt 43354347 md5:cf18ad0a99227af05f04ba6f3da2718c https://zenodo.org/records/4525015/files/text-embeddings.zip 753438 md5:d7838210ab293c77d654d3c866e9e782 https://zenodo.org/records/4525015/files/split-indices.zip 92272 md5:152fdcf7c29a141e8b2febc3f6db437a https://zenodo.org/records/4525015/files/helpers.zip 448958857 md5:2bcb1eb32600f292d6004720ab5971e7 https://zenodo.org/records/4525015/files/war-of-words-2-ep7.txt open 10.1145/3442381.3450131 Is documented by doi 10.1145/3366423.3380041 Continues doi 10.5281/zenodo.4525014 isVersionOf doi