BenchLS: A Reliable Dataset for Lexical Simplification

Paetzold; Specia

doi:10.5281/zenodo.2552393

Published May 23, 2016 | Version v1

Dataset Open

BenchLS: A Reliable Dataset for Lexical Simplification

1. Gustavo Henrique
2. Lucia

To create our dataset we combined two resources: the LexMTurk (Horn et al., 2014) and LSeval (De Belder and Moens, 2012) datasets. The instances in both datasets, 929 in total, contain a sentence, a target complex word, and several candidate substitutions ranked according to their simplicity. The candidates in both datasets were suggested and ranked by English speakers from the U.S. To increase its reliability, we applied the following corrections over each instance of our dataset:

Spelling Filtering: We discard any misspelled can- didates using Norvig’s algorithm. We trained our spelling model over the News Crawl corpus.
Inflection Correction: We inflected all candidates to the tense of the target word using the Text Adorning module of LEXenstein (Paetzold and Specia, 2015; Burns, 2013).

The resulting dataset – BenchLS – contains 929 instances, with an average of 7.37 candidate substitutions per complex word.

Files

BenchLS.zip

Files (93.9 kB)

Name	Size	Download all
BenchLS.zip md5:54566820e042694f53f37c4da7c91e86	93.9 kB	Preview Download

Additional details

European Commission
SIMPATICO - SIMplifying the interaction with Public Administration Through Information technology for Citizens and cOmpanies 692819

Views

189

Downloads

Show more details

	All versions	This version
Views	1,381	1,381
Downloads	189	189
Data volume	18.4 MB	18.4 MB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Conference

Tenth International Conference on Language Resources and Evaluation (LREC 2016) , Portorož, Slovenia, 23-28 May 2016

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: January 29, 2019
Modified: January 24, 2020

BenchLS: A Reliable Dataset for Lexical Simplification

Creators

Description

Files

BenchLS.zip

Files (93.9 kB)

Additional details

Funding