Published January 23, 2019 | Version v1
Dataset Open

Italian Lexical Simplification Benchmark

  • 1. FBK
  • 2. University of Trento

Description

The corpus is a manually created benchmark to evaluate the performance of Italian lexical simplification systems. It contains 901 pairs of complex sentences and their simplified version at the lexical level (i.e. replacement of a difficult term or phrase with a simpler synonym). The dataset and a system using the benchmark are described in the paper "The impact of phrases on Italian lexical simplification"  https://zenodo.org/record/1048874

Files

lexsim-italiano-Comune.txt

Files (311.1 kB)

Name Size Download all
md5:028f3446b777ac7c268823ae9c6c5148
83.6 kB Preview Download
md5:c4c83cf80c5188b8ae85038e3892742f
53.0 kB Preview Download
md5:2c502668f71c4545e2951f22a92a993b
174.5 kB Preview Download

Additional details

Funding

SIMPATICO – SIMplifying the interaction with Public Administration Through Information technology for Citizens and cOmpanies 692819
European Commission

References

  • Sara Tonelli and Alessio Palmero Aprosio and Marco Mazzon, "The Impact of Phrases on Italian Lexical Simplification", In Proceedings of CLIC-it 2017, Rome.