Published August 25, 2023 | Version 1.1.0
Dataset Open

POLIcy design ANNotAtions (POLIANNA): Towards understanding policy design through text-as-data approaches

  • 1. Energy and Technology Policy Group, Department of Humanities, Social and Political Sciences, ETH Zurich
  • 2. Hertie School Berlin
  • 3. Climate Physics Group, Department of Environmental Systems Science, ETH Zurich

Contributors

  • 1. Hertie School

Description

The POLIANNA dataset is a collection of legislative texts from the European Union (EU) that have been annotated based on theoretical concepts of policy design. The dataset consists of 20,577 annotated spans in 412 articles, drawn from 18 EU climate change mitigation and renewable energy laws, and can be used to develop supervised machine learning approaches for scaling policy analysis. The dataset includes a novel coding scheme for annotating text spans, and you find a description of the annotated corpus, an analysis of inter-annotator agreement, and a discussion of potential applications in the paper accompanying this dataset. The objective of this dataset to build tools that assist with manual coding of policy texts by automatically identifying relevant paragraphs.

Detailed instructions and further guidance about the dataset as well as all the code used for this project can be found in the accompanying paper and on the GitHub project page. The repository also contains useful code to calculate various inter-annotator agreement measures and can be used to process text annotations generated by INCEpTION.

 

Dataset Description

We provide the dataset in 3 different formats:

JSON: Each article corresponds to a folder, where the Tokens and Spans are stored in a separate JSON file. Each article-folder further contains the raw policy-text as in a text file and the metadata about the policy. This is the most human-readable format.

JSONL: Same folder structure as the JSON format, but the Spans and Tokens are stored in a JSONL file, where each line is a valid JSON document.

Pickle: We provide the dataset as a Python object. This is the recommended method when using our own Python framework that is provided on GitHub. For more information, check out the GitHub project page.


License

The POLIANNA dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. If you use the POLIANNA dataset in your research in any form, please cite the dataset.

 

Citation

Sewerin, S., Kaack, L.H., Küttel, J. et al. Towards understanding policy design through text-as-data approaches: The policy design annotations (POLIANNA) dataset. Sci Data10, 896 (2023). https://doi.org/10.1038/s41597-023-02801-z

Notes

This work was also supported by ETH Career Seed Grant SEED-24 19-2, funded by the ETH Zurich Foundation.

Files

POLIANNA_v1_1.zip

Files (23.1 MB)

Name Size Download all
md5:c64bc9cf4b3f238c57e72b068f56728e
23.1 MB Preview Download

Additional details

Related works

Is documented by
Journal article: 10.1038/s41597-023-02801-z (DOI)

Funding

Swiss National Science Foundation
Uncovering policy designs: A training dataset for future automated text analysis CRSK-1_190936