TurkishLegalBench: A Comprehensive Multi-Task Benchmark Suite for Turkish Legal NLP

Erkan, Mehmet Ali; Yozgatligil, Ceylan

doi:10.5281/zenodo.19798795

Published February 9, 2026 | Version v2

Dataset Open

TurkishLegalBench: A Comprehensive Multi-Task Benchmark Suite for Turkish Legal NLP

1. Middle East Technical University

TurkishLegalBench is a large-scale, multi-task benchmark suite specifically designed for Natural Language Processing (NLP) in the Turkish legal domain. The dataset consists of 38,009 authentic legal documents curated from Turkish high courts. It covers seven distinct legal tasks categorized under three pillars: Classification (TurkVerdict, TurkVenue, TurkCanon), Extraction (TurkChronos, TurkCite), and Reasoning (TurkCoherence, TurkAudit). This resource aims to bridge the gap in low-resource legal NLP for Turkish and provides a standardized evaluation framework for researchers.

Methods

Under Review: This repository contains the official dataset and codebase for the paper "TurkLexBench: A Comprehensive Multi-Task Benchmark Suite for Turkish Legal NLP", currently under review for KDD 2026 (Datasets & Benchmarks Track). While the data is open for reproducibility, please cite the work if you use it.

The Tasks (The 3 Pillars)

We organize the benchmark into three pillars representing different levels of legal cognition:

I. The Gavel (High-Level Classification)

Task	Description	Metric	Size (Train/Dev/Test)
TurkVerdict	Predict the judgment outcome (e.g., Affirmation, Reversal) from the case rationale.	Macro-F1	18k / 2.2k / 2.2k
TurkVenue	Identify the competent court chamber (Daire) based on case facts (36 classes).	Macro-F1	19.2k / 2.7k / 5.5k
TurkCanon	Classify legislative documents into types (Law, Regulation, Decree, etc.).	Macro-F1	6.3k / 0.9k / 1.8k

II. The Quill (Information Extraction)

Task	Description	Metric	Size (Train/Dev/Test)
TurkChronos	Identify the decision year of a case amidst distractor dates.	Accuracy	19.4k / 2.7k / 5.5k
TurkCite	Extract citations (Law No. & Article No.) from unstructured text (NER).	Entity F1	10.9k / 1.5k / 3.1k

III. The Scale (Legal Reasoning)

Task	Description	Metric	Size (Train/Dev/Test)
TurkCoherence	Natural Language Inference (NLI) to check if the reasoning supports the verdict.	Macro-F1	4.9k / 0.7k / 1.4k
TurkAudit	Detect "legal hallucinations" and anachronistic citations (e.g., citing a 2016 law in 2010).	Weighted-F1	7k / 1k / 2k

Benchmark Results

We evaluated baseline and domain-adapted models across all 7 tasks using the Test Set. The table below reports the primary metric for each task (Macro-F1 for classification, Accuracy for Chronos, and Entity-F1 for NER).

Model	Verdict (m-F1)	Venue (m-F1)	Canon (m-F1)	Chronos (Acc)	Cite (F1)	Coherence (m-F1)	Audit (W-F1)
TFIDF-SVM	68.4	41.7	35.8	56.0	76.4	-	87.6
BERTurk	81.9	79.4	92.6	94.1	95.5	56.8	96.9
Legal-BERT (Eng)	68.1	63.1	91.1	92.3	95.5	50.1	95.1
XLM-RoBERTa	79.2	64.4	70.4	70.2	95.4	33.7	96.6
Longformer	69.3	59.9	92.6	93.7	94.5	43.9	98.5
BERT-TR-128k	82.6	82.2	93.8	94.0	96.0	50.4	97.6

The BERT-TR-128k model (with expanded vocabulary) achieves State-of-the-Art (SOTA) in 4 out of 7 tasks, significantly outperforming standard BERTurk in extraction and rare-class classification tasks.

Longformer dominates the TurkAudit task (%98.5), proving that large context windows are essential for "needle-in-a-haystack" retrieval tasks where the anomaly appears late in the document.

All models struggle with the TurkCoherence (NLI) task, indicating that current LLMs are better at surface-level pattern matching than deep legal logical entailment.

License

This dataset and benchmark suite are distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Under this license, you are free to:

Share — copy and redistribute the material in any medium or format.
Adapt — remix, transform, and build upon the material.

Under the following terms:

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made.
NonCommercial — You may not use the material for commercial purposes.
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

Note: The underlying raw texts (court decisions and laws) are public records. This license applies to the curated benchmark, annotations, and structured dataset created by the authors.

Citation

If you use TurkLexBench (data, code, or models) in your research, please cite our paper:

@inproceedings{erkan2026turklexbench,
title={TurkLexBench: A Comprehensive Multi-Task Benchmark Suite for Turkish Legal NLP},
author={Erkan, Mehmet Ali and Yozgatlıgil, Ceylan},
booktitle={Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '26)},
year={2026},
publisher={ACM},
doi={10.5281/zenodo.18555735},
url={https://doi.org/10.5281/zenodo.18555735},
note={Under Review}
}

Files

TurkAudit.zip

Files (267.7 MB)

Name	Size	Download all
TurkAudit.zip md5:98e12a4de602e23c6c941626bee03aeb	17.1 MB	Preview Download
TurkCanon.zip md5:7db311ec7ba4367ad8b45159d5afbad0	62.6 MB	Preview Download
TurkChronos.zip md5:a58b448efd845fb0e7e32d5ca7c22d34	48.7 MB	Preview Download
TurkCite.zip md5:36d9a27ab6145b559db60a360d396eef	43.5 MB	Preview Download
TurkCoherence.zip md5:5f079ce006697c78ba896413512f01c8	17.0 MB	Preview Download
TurkVenue.zip md5:e536bcaaf577cc13cea8edd1a9bde9d3	41.3 MB	Preview Download
TurkVerdict.zip md5:3aeb3eca2944c302b5223055191a0e80	37.5 MB	Preview Download

Additional details

Repository URL: https://github.com/mrkn7/TurkishLegalBench
Programming language: Python

	All versions	This version
Views	204	60
Downloads	116	16
Data volume	4.6 GB	590.0 MB

TurkishLegalBench: A Comprehensive Multi-Task Benchmark Suite for Turkish Legal NLP

Authors/Creators

Description

Methods

The Tasks (The 3 Pillars)

I. The Gavel (High-Level Classification)

II. The Quill (Information Extraction)

III. The Scale (Legal Reasoning)

Benchmark Results

License

Under this license, you are free to:

Under the following terms:

Citation

Files

TurkAudit.zip

Files (267.7 MB)

Additional details

Software