Published March 15, 2022 | Version v1
Dataset Open

UK-LEX Dataset - Part of Chalkidis and Søgaard (2022)

  • 1. University of Copenhagen

Description

The UK-LEX dataset is part of the work "Ilias Chalkidis and Anders Søgaard. Improved Multi-label Classification under Temporal Concept Drift: Rethinking Group-Robust Algorithms in a Label-Wise Setting. 2022. In the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin, Ireland."

Details:

United Kingdom (UK) legislation is publicly available as part of the United Kingdom's National Archives (https://www.legislation.gov.uk). Most of the laws have been categorized in thematic categories (e.g., health-care, finance, education, transportation, planning) that are presented in the document preamble and are used for archival indexing purposes.

We release a new dataset, which comprises 36.5k UK laws (documents). The dataset is chronologically split in training (20k, 1975--2002), development (8.5k, 2002--2008), test (8.5k, 2008--2018) subsets. We manually extract and cluster the topics to supports two different label granularities,  comprising 18, and 69 topics (labels), respectively.

Data Files:

uk-lex18.jsonl: The dataset where documents are annotated with 18 different topics (labels).
uk-lex69.jsonl: The dataset where documents are annotated with 69 different topics (labels).

Files

Files (523.1 MB)

Name Size Download all
md5:adc67c56144a530f5e91b77ff29c933c
261.5 MB Download
md5:557ff85ba6297259d0a2de107f2f7640
261.6 MB Download