Published July 22, 2023 | Version 1.0
Dataset Open

EU legislation published between 1971-2022

  • 1. Netherlands eScience Center
  • 2. Radboud University
  • 3. Aarhus University

Description

EU Legislation Documents and Metadata from 1971 to 2022 (English language)

This is the set of full text regulation, decision and directive documents in PDF and HTML format, in the English language, downloaded from EURLEX, together with metadata in CSV format about these documents. The documents were downloaded using this Python script, and the metadata was extracted from the CELLAR SPARQL endpoint using this Python script.

During the download process, HTML versions for the legislative documents were extracted if they were available. If there was no HTML version available for a particular document, the PDF version was downloaded (HTML versions were preferred because it is generally simpler to extract and process the text with software because of the added structure the format provides). If there was neither an HTML nor PDF version available, we made a note of the unique identifier (CELEX number) for those documents. The archive in this Zenodo repository which contains all the full text documents consists of three directories "htmls/", "pdfs/" and "problems/", which contain all the downloaded documents in that particular format. The "problems/" directory contains a list of blank .txt files where the name of each file is the CELEX number for a legislative document that was not available on EURLEX for download.

For more information about the scripts and a description of the metadata extracted, please see this Github repository.

The data was extracted as part of the Nature of EU Rules project which seeks to analyse the "strictness" and density of EU regulations over time and by legal policy area.

 

Files

eu_regulations_fulltexts_html_pdf_1971_2022.zip

Files (5.7 GB)

Name Size Download all
md5:5fe22b3fc57ff8d165f260130e2f70c2
5.6 GB Preview Download
md5:1a6cdd90765671543730c06f247d2534
60.4 MB Preview Download