POLIcy design ANNotAtions (POLIANNA): Towards understanding policy design through text-as-data approaches
Creators
- 1. Energy and Technology Policy Group, Department of Humanities, Social and Political Sciences, ETH Zurich
- 2. Hertie School Berlin
- 3. Climate Physics Group, Department of Environmental Systems Science, ETH Zurich
Description
The POLIANNA dataset is a collection of legislative texts from the European Union (EU) that have been annotated based on theoretical concepts of policy design. The dataset consists of 20,577 annotated spans in 412 articles, drawn from 18 EU climate change mitigation and renewable energy laws, and can be used to develop supervised machine learning approaches for scaling policy analysis. The dataset includes a novel coding scheme for annotating text spans, and you find a description of the annotated corpus, an analysis of inter-annotator agreement, and a discussion of potential applications in the paper accompanying this dataset. The objective of this dataset to build tools that assist with manual coding of policy texts by automatically identifying relevant paragraphs.
Detailed instructions and further guidance about the dataset as well as all the code used for this project can be found in the accompanying paper and on the GitHub project page. The repository also contains useful code to calculate various inter-annotator agreement measures and can be used to process text annotations generated by INCEpTION.
Dataset Description
We provide the dataset in 3 different formats:
JSON: Each article corresponds to a folder, where the Tokens and Spans are stored in a separate JSON file. Each article-folder further contains the raw policy-text as in a text file and the metadata about the policy. This is the most human-readable format.
JSONL: Same folder structure as the JSON format, but the Spans and Tokens are stored in a JSONL file, where each line is a valid JSON document.
Pickle: We provide the dataset as a Python object. This is the recommended method when using our own Python framework that is provided on GitHub. For more information, check out the GitHub project page.
License
The POLIANNA dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. If you use the POLIANNA dataset in your research in any form, please cite the dataset.
Citation
Sewerin, S., Kaack, L.H., Küttel, J. et al. Towards understanding policy design through text-as-data approaches: The policy design annotations (POLIANNA) dataset. Sci Data10, 896 (2023). https://doi.org/10.1038/s41597-023-02801-z
Notes
Files
      
        POLIANNA_v1_1.zip
        
      
    
    
      
        Files
         (23.1 MB)
        
      
    
    | Name | Size | Download all | 
|---|---|---|
| md5:c64bc9cf4b3f238c57e72b068f56728e | 23.1 MB | Preview Download | 
Additional details
Related works
- Is documented by
- Journal article: 10.1038/s41597-023-02801-z (DOI)
Funding
- Swiss National Science Foundation
- Uncovering policy designs: A training dataset for future automated text analysis CRSK-1_190936