Published February 9, 2026 | Version v2
Dataset Open

TurkishLegalBench: A Comprehensive Multi-Task Benchmark Suite for Turkish Legal NLP

  • 1. ROR icon Middle East Technical University

Description

TurkishLegalBench is a large-scale, multi-task benchmark suite specifically designed for Natural Language Processing (NLP) in the Turkish legal domain. The dataset consists of 38,009 authentic legal documents curated from Turkish high courts. It covers seven distinct legal tasks categorized under three pillars: Classification (TurkVerdict, TurkVenue, TurkCanon), Extraction (TurkChronos, TurkCite), and Reasoning (TurkCoherence, TurkAudit). This resource aims to bridge the gap in low-resource legal NLP for Turkish and provides a standardized evaluation framework for researchers.

Methods

Under Review: This repository contains the official dataset and codebase for the paper "TurkLexBench: A Comprehensive Multi-Task Benchmark Suite for Turkish Legal NLP", currently under review for KDD 2026 (Datasets & Benchmarks Track). While the data is open for reproducibility, please cite the work if you use it.

The Tasks (The 3 Pillars)

We organize the benchmark into three pillars representing different levels of legal cognition:

I. The Gavel (High-Level Classification)

Task Description Metric Size (Train/Dev/Test)
TurkVerdict Predict the judgment outcome (e.g., Affirmation, Reversal) from the case rationale. Macro-F1 18k / 2.2k / 2.2k
TurkVenue Identify the competent court chamber (Daire) based on case facts (36 classes). Macro-F1 19.2k / 2.7k / 5.5k
TurkCanon Classify legislative documents into types (Law, Regulation, Decree, etc.). Macro-F1 6.3k / 0.9k / 1.8k

II. The Quill (Information Extraction)

Task Description Metric Size (Train/Dev/Test)
TurkChronos Identify the decision year of a case amidst distractor dates. Accuracy 19.4k / 2.7k / 5.5k
TurkCite Extract citations (Law No. & Article No.) from unstructured text (NER). Entity F1 10.9k / 1.5k / 3.1k

III. The Scale (Legal Reasoning)

Task Description Metric Size (Train/Dev/Test)
TurkCoherence Natural Language Inference (NLI) to check if the reasoning supports the verdict. Macro-F1 4.9k / 0.7k / 1.4k
TurkAudit Detect "legal hallucinations" and anachronistic citations (e.g., citing a 2016 law in 2010). Weighted-F1 7k / 1k / 2k

Benchmark Results

We evaluated baseline and domain-adapted models across all 7 tasks using the Test Set. The table below reports the primary metric for each task (Macro-F1 for classification, Accuracy for Chronos, and Entity-F1 for NER).

Model Verdict
(m-F1)
Venue
(m-F1)
Canon
(m-F1)
Chronos
(Acc)
Cite
(F1)
Coherence
(m-F1)
Audit
(W-F1)
TFIDF-SVM 68.4 41.7 35.8 56.0 76.4 - 87.6
BERTurk 81.9 79.4 92.6 94.1  95.5 56.8  96.9
Legal-BERT (Eng) 68.1 63.1 91.1 92.3 95.5 50.1 95.1
XLM-RoBERTa 79.2 64.4 70.4 70.2 95.4 33.7 96.6
Longformer 69.3 59.9 92.6 93.7 94.5 43.9 98.5 
BERT-TR-128k 82.6  82.2  93.8  94.0 96.0  50.4 97.6
  • The BERT-TR-128k model (with expanded vocabulary) achieves State-of-the-Art (SOTA) in 4 out of 7 tasks, significantly outperforming standard BERTurk in extraction and rare-class classification tasks.
  •  Longformer dominates the TurkAudit task (%98.5), proving that large context windows are essential for "needle-in-a-haystack" retrieval tasks where the anomaly appears late in the document.
  • All models struggle with the TurkCoherence (NLI) task, indicating that current LLMs are better at surface-level pattern matching than deep legal logical entailment.

License

This dataset and benchmark suite are distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Under this license, you are free to:

  • Share — copy and redistribute the material in any medium or format.
  • Adapt — remix, transform, and build upon the material.

Under the following terms:

  • Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made.
  • NonCommercial — You may not use the material for commercial purposes.
  • ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

Note: The underlying raw texts (court decisions and laws) are public records. This license applies to the curated benchmark, annotations, and structured dataset created by the authors.

Citation

If you use TurkLexBench (data, code, or models) in your research, please cite our paper:

@inproceedings{erkan2026turklexbench,
  title={TurkLexBench: A Comprehensive Multi-Task Benchmark Suite for Turkish Legal NLP},
  author={Erkan, Mehmet Ali and Yozgatlıgil, Ceylan},
  booktitle={Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '26)},
  year={2026},
  publisher={ACM},
  doi={10.5281/zenodo.18555735},
  url={https://doi.org/10.5281/zenodo.18555735},
  note={Under Review}

Files

TurkAudit.zip

Files (267.7 MB)

Name Size Download all
md5:98e12a4de602e23c6c941626bee03aeb
17.1 MB Preview Download
md5:7db311ec7ba4367ad8b45159d5afbad0
62.6 MB Preview Download
md5:a58b448efd845fb0e7e32d5ca7c22d34
48.7 MB Preview Download
md5:36d9a27ab6145b559db60a360d396eef
43.5 MB Preview Download
md5:5f079ce006697c78ba896413512f01c8
17.0 MB Preview Download
md5:e536bcaaf577cc13cea8edd1a9bde9d3
41.3 MB Preview Download
md5:3aeb3eca2944c302b5223055191a0e80
37.5 MB Preview Download

Additional details

Software

Repository URL
https://github.com/mrkn7/TurkishLegalBench
Programming language
Python