Published February 25, 2022 | Version 1.0.0
Dataset Open

Dzongkha Handwritten Digit Dataset

  • 1. National Institute of Technology Silchar

Description

Dzongkha, the national language of Bhutan, has limited resources available for Natural Language Processing (NLP) tasks because the language is relatively understudied. However, there is no publicly available benchmark dataset for handwritten character identification in the Dzongkha digit script. The dataset contains 1000 images of handwritten Dzongkha digits that are captured using Google Jamboard in JPG format. The image data is assembled from a total of 100 indigenous and non-indigenous people of Bhutan irrespective of age, gender, educational background, etc. In the designed dataset, there are 10 different classes of Dzongkha digits which range from 0 to 9. The labels of these classes are: 0 (༠), 1 (༡), 2 (༢), 3 (༣), 4 (༤), 5 (༥), 6 (༦), 7 (༧), 8 (༨), 9 (༩).

Files

Dataset.zip

Files (73.1 MB)

Name Size Download all
md5:6f6d82413a6de6bbdefa99c33502b3df
73.1 MB Preview Download