Published October 30, 2014 | Version v1
Dataset Open

Classification of word levels with usage frequency, expert opinions and machine learning

  • 1. Istanbul Sehir University

Description

This dataset includes classification of English words according to CEFR language levels. It can be used in various educational applications including determining levels of text that is appropriate for students learning English. 

For each word, part-of-speech, the word lemma and usage frequency is provided. For words that have no survey results, a machine learning based methodology is used to predict levels. These predictions are also included as a separate file. This data is released as part of the submission process to British Journal of Educational Technology Special Issue on Open Data.

The included readme.pdf file contains a detailed description of data. 

Files

Files (2.1 MB)

Name Size Download all
md5:5971216225ddddc0ac18e757e24daa1f
2.1 MB Download