Published August 21, 2021
| Version 4
Dataset
Open
CWID-hi: A Dataset for Complex Word Identification in Hindi Text
Creators
- 1. Symbiosis Institute of Computer Studies and Research, Symbiosis International (Deemed University)
- 2. Symbiosis Centre for Information Technology, Symbiosis International (Deemed University)
Description
This dataset was created by conducting a human intelligence test, wherein native and non-native Hindi speakers annotated words they could not understand in Hindi text. They were then asked to rank the complexity of these words along with their synonyms. A word that received an average rank of <=3 (out of 5) is labeled 1 and the word that received an average rank of >3 is labeled 0. 1 indicates complex and 0 indicates simple.
Files
dataset.csv
Files
(863.4 kB)
Name | Size | Download all |
---|---|---|
md5:e848d395304103a1ab510edd6227e1ae
|
863.4 kB | Preview Download |