Dzongkha Handwritten Digit Dataset
- 1. National Institute of Technology Silchar
Description
Dzongkha, the national language of Bhutan, has limited resources available for Natural Language Processing (NLP) tasks because the language is relatively understudied. However, there is no publicly available benchmark dataset for handwritten character identification in the Dzongkha digit script. The dataset contains 1000 images of handwritten Dzongkha digits that are captured using Google Jamboard in JPG format. The image data is assembled from a total of 100 indigenous and non-indigenous people of Bhutan irrespective of age, gender, educational background, etc. In the designed dataset, there are 10 different classes of Dzongkha digits which range from 0 to 9. The labels of these classes are: 0 (༠), 1 (༡), 2 (༢), 3 (༣), 4 (༤), 5 (༥), 6 (༦), 7 (༧), 8 (༨), 9 (༩).
Files
Dataset.zip
Files
(73.1 MB)
Name | Size | Download all |
---|---|---|
md5:6f6d82413a6de6bbdefa99c33502b3df
|
73.1 MB | Preview Download |