LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Ilias Chalkidis; Abhik Jana; Dirk Hartung; Michael Bommarito; Ion Androutsopoulos; Daniel Martin Katz; Nikolaos Aletras

doi:10.5281/zenodo.5529774

Published September 27, 2021 | Version 1.0

Dataset Open

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

1. University of Copenhagen
2. Universität Hamburg
3. Bucerius Law School
4. CodeX, Stanford Law School
5. Athens University of Economics and Business
6. Illinois Tech – Chicago Kent College of Law
7. University of Sheffield

This benchmark dataset is published with the article:

Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz, and Nikolaos Aletras. 2021. LexGLUE: A Benchmark Dataset for Legal Language Understanding in English. ArXiv.

Short Description

Inspired by the recent widespread use of the GLUE multi-task benchmark NLP dataset (Wang et al., 2018), the subsequent more difficult SuperGLUE (Wang et al., 2019), other previous multi-task NLP benchmarks (Conneau and Kiela,2018; McCann et al., 2018), and similar initiatives in other domains (Peng et al., 2019), we introduce LexGLUE, a benchmark dataset to evaluate the performance of NLP methods in legal tasks. LexGLUE is based on seven existing legal NLP datasets:

ECtHR Task A (Chalkidis et al., 2019)
ECtHR Task B (Chalkidis et al., 2021a)
SCOTUS (Spaeth et al., 2020)
EUR-LEX (Chalkidis et al., 2021b)
LEDGAR (Tuggener et al. (2020)
UNFAIR-ToS (Lippi et al., 2019)
CaseHOLD (Zheng et al., 2021)

Files

casehold.csv

Files (1.1 GB)

Name	Size
casehold.csv md5:24eb6586a3eef9d647c30fd73a398aad	105.6 MB	Preview Download
ecthr.jsonl md5:95d9babdc2d37036ac88ec3f3f873ad3	119.1 MB	Download
eurlex.jsonl md5:e79fe824b162e9dbcec56b225b2f80ef	503.9 MB	Download
ledgar.jsonl md5:1db4936e68a09b1af2735e6206750b42	63.4 MB	Download
scotus.jsonl md5:a7ee7d759f368986929d24f5897b797d	335.7 MB	Download
unfair_tos.jsonl md5:6b6a39d2a96a160b9819eb1c1c4e1091	2.6 MB	Download

	All versions	This version
Views	1,862	608
Downloads	11,987	436
Data volume	1.2 TB	85.2 GB

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Authors/Creators

Description

Files

casehold.csv

Files (1.1 GB)