UzLegalNER v3_fixed: Uzbek Legal Contracts Named Entity Recognition Dataset (PER/ORG/LOC/POSITION/DATE/MONEY/DOCNO)
Description
UzLegalNER v3_fixed is a named entity recognition (NER) dataset for Uzbek legal contracts and related official documents. The dataset uses a seven-label schema: PER, ORG, LOC, POSITION, DATE, MONEY, DOCNO. We release: (i) a master spreadsheet (XLSX) with sentence-level metadata and character-level entity spans, (ii) a JSONL version with span annotations, and (iii) CoNLL BIO splits (train/dev/test) for standard NER training and benchmarking.
Key fields: sent_id (unique per sentence), doc_id (document/group identifier for doc-level splitting), doc_type, script (latin), split, text, and entities (start/end/label/text). Overlapping/nested spans are removed for CoNLL compatibility (the longest span is retained).
Intended use: training and evaluating Transformer-based NER models and gazetteer-enhanced methods, with a particular focus on robustness to unseen entity surface forms in legal text.
Files
changelog.md
Files
(473.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:145de68609e49ec881b087e67a97547b
|
416 Bytes | Preview Download |
|
md5:b2ce03e50accc4f4f0e6b7552b36847e
|
463 Bytes | Download |
|
md5:3609fb9574c55dddbb0d076aa5300559
|
2.0 kB | Preview Download |
|
md5:f5a1dd0ec6dc3844c2df36d15e76032f
|
453 Bytes | Download |
|
md5:281fb88dc31452580f9cf3108341cebe
|
2.3 kB | Preview Download |
|
md5:373ec995c0ec5689c046278f194d24c1
|
467.6 kB | Preview Download |