Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published January 8, 2019 | Version 2.
Conference paper Open

SIMPITIKI: A Simplification Corpus for Italian

  • 1. Fondazione Bruno Kessler

Description

In this work, we analyse whether Wikipedia can be used to leverage simplification pairs instead of Simple Wikipedia, which has proved unreliable for assessing automatic simplification systems, and is available only in English. We focus on sentence pairs in which the target sentence is the outcome of a Wikipedia edit marked as ‘simplified’, and manually annotate simplification phenomena following an existing scheme proposed for previous simplification corpora in Italian. The outcome of this work is the SIMPITIKI corpus, which we make freely available, with pairs of sentences extracted from Wikipedia edits and annotated with simplification types. The resource contains also another corpus with roughly the same number of simplifications, which was manually created by simplifying documents in the administrative domain

Files

clic2016-SIMPITIKI.pdf

Files (351.8 kB)

Name Size Download all
md5:cd5e7c28497a46824e563e377a18dc0d
351.8 kB Preview Download

Additional details

Funding

SIMPATICO – SIMplifying the interaction with Public Administration Through Information technology for Citizens and cOmpanies 692819
European Commission