Sindhi Open Lexicon Dataset - SindhiLanguage.org
Authors/Creators
Contributors
Producer (2):
Description
Sindhi Open Lexicon Dataset (223K+ Entries) for AI, NLP & Computational Linguistics. This project is a large-scale structured lexical dataset for the Sindhi language containing over 223,000 entries including definitions, linguistic metadata, and normalized forms. Sindhi is a historically rich but low-resource language in AI. This dataset aims to support NLP, AI systems, and computational linguistics. Objectives - Provide AI-ready Sindhi dataset - Support NLP research - Enable search engines, chatbots, OCR, and language tools - Preserve linguistic heritage digitally Dataset Features - 223,000+ entries - Definitions in Sindhi - Variants with/without diacritics - Normalized text - Domain classification - Formats: CSV, JSONL, SQLite
Files
LICENSE.txt
Files
(341.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:5805d2a4d7905ed324028aa4797cf987
|
733 Bytes | Preview Download |
|
md5:ff6727ed521acaeaafa42409250b8f67
|
2.6 kB | Preview Download |
|
md5:fee6fcb21880a6a28ebf38d11bd0f13d
|
283.7 kB | Preview Download |
|
md5:5904e41a047139258423d0a779f533a2
|
84.5 MB | Preview Download |
|
md5:8282ca3a656275d282bd98470de2a8fc
|
149.4 MB | Download |
|
md5:e17533fd16c705dff77824ea285532de
|
107.0 MB | Download |
|
md5:958201824dcf21321a63f9446c92987f
|
765 Bytes | Preview Download |
|
md5:28281547a0809e48a1155ae6e55723ac
|
876 Bytes | Preview Download |