Published April 29, 2022
| Version 5
Other
Open
CWID-hi: A Dataset for Complex Word Identification in Hindi Text
- 1. Symbiosis Institute of Computer Studies and Research, Symbiosis International (Deemed University)
- 2. Symbiosis Centre for Information Technology, Symbiosis International (Deemed University)
- 3. Cognitive Science Research Group, Queen Mary University of London,
Description
This dataset was created by conducting a human intelligence test, wherein native and non-native Hindi speakers annotated words they could not understand in Hindi text. They were then asked to rank the complexity of these words along with their synonyms. A word that received an average rank of <=3 (out of 5) is labeled 1 and the word that received an average rank of >3 is labeled 0. 1 indicates complex and 0 indicates simple.
Files
Ranked_dataset.csv
Files
(906.5 kB)
Name | Size | Download all |
---|---|---|
md5:d1f305cbb16f4ca528304af6dbb5cd7d
|
906.5 kB | Preview Download |