Classification of word levels with usage frequency, expert opinions and machine learning

Guzey, Onur; Sohsah, Gihad; Unal, Muhammed

doi:10.5281/zenodo.12501

Published October 30, 2014 | Version v1

Dataset Open

Classification of word levels with usage frequency, expert opinions and machine learning

1. Istanbul Sehir University

This dataset includes classification of English words according to CEFR language levels. It can be used in various educational applications including determining levels of text that is appropriate for students learning English.

For each word, part-of-speech, the word lemma and usage frequency is provided. For words that have no survey results, a machine learning based methodology is used to predict levels. These predictions are also included as a separate file. This data is released as part of the submission process to British Journal of Educational Technology Special Issue on Open Data.

The included readme.pdf file contains a detailed description of data.

Files

Files (2.1 MB)

Name	Size	Download all
word-level-survey.tar.gz md5:5971216225ddddc0ac18e757e24daa1f	2.1 MB	Download

Views

322

Downloads

Show more details

	All versions	This version
Views	1,630	1,626
Downloads	322	321
Data volume	689.8 MB	687.7 MB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 10, 2014
Modified: January 24, 2020

Classification of word levels with usage frequency, expert opinions and machine learning

Authors/Creators

Description

Files

Files (2.1 MB)