Published October 30, 2014
| Version v1
Dataset
Open
Classification of word levels with usage frequency, expert opinions and machine learning
Description
This dataset includes classification of English words according to CEFR language levels. It can be used in various educational applications including determining levels of text that is appropriate for students learning English.
For each word, part-of-speech, the word lemma and usage frequency is provided. For words that have no survey results, a machine learning based methodology is used to predict levels. These predictions are also included as a separate file. This data is released as part of the submission process to British Journal of Educational Technology Special Issue on Open Data.
The included readme.pdf file contains a detailed description of data.
Files
Files
(2.1 MB)
Name | Size | Download all |
---|---|---|
md5:5971216225ddddc0ac18e757e24daa1f
|
2.1 MB | Download |