Using large language models to develop readability formulas for educational settings

Scott Crossley; Joon Suh Choi; Yanisa Scherber; Mathis Lucka

doi:10.1007/978-3-031-36336-8_66

Published June 30, 2023 | Version v1

Journal article Open

Using large language models to develop readability formulas for educational settings

1. Vanderbilt University
2. Georgia State University
3. deepset

Readability formulas can be used to better match readers and texts. Current state-of-the-art readability formulas rely on large language models like transformer models (e.g., BERT) that model language semantics. However, the size and runtimes make them impractical in educational settings. This study examines the effectiveness of new readability formulas developed on the CommonLit Ease of Readability (CLEAR) corpus using more efficient sentence- embedding models including doc2vec, Universal Sentence Encoder, and Sentence BERT. This study compares sentence-embedding models to traditional readability formulas, newer NLP-informed linguistic feature formulas, and newer BERT-based models. The results indicate that sentence-embedding readability formulas perform well and are practical for use in various educational settings. The study also introduces an open-source NLP website to readily assess the readability of texts along with an application programming interface (API) that can be integrated into online educational learning systems to better match texts to readers.

Files

clear_aied_efficiency_revision_final.pdf

Files (294.0 kB)

Name	Size	Download all
clear_aied_efficiency_revision_final.pdf md5:6765fbe7ba1a0652e7293cd44e144147	294.0 kB	Preview Download

	All versions	This version
Views	184	184
Downloads	78	78
Data volume	27.0 MB	27.0 MB

Using large language models to develop readability formulas for educational settings

Creators

Description

Files

clear_aied_efficiency_revision_final.pdf

Files (294.0 kB)