Published January 8, 2019 | Version v2
Dataset Open

SIMPITIKI corpus for simplification in Italian

  • 1. Fondazione Bruno Kessler

Description

SIMPITIKI is a Simplification corpus for Italian and it consists of two sets of simplified pairs: the first one is harvested from the Italian Wikipedia in a semi-automatic way; the second one is manually annotated sentence-by-sentence from documents in the administrative domain.

For more details, see https://github.com/dhfbk/simpitiki

Files

simpitiki-v2.xml

Files (911.4 kB)

Name Size Download all
md5:c2c00a432221250ee4fbaf1eaa7b6a6d
911.4 kB Preview Download

Additional details

Funding

SIMPATICO – SIMplifying the interaction with Public Administration Through Information technology for Citizens and cOmpanies 692819
European Commission

References

  • Sara Tonelli, Alessio Palmero Aprosio, Francesca Saltori. SIMPITIKI: a Simplification corpus for Italian extracted from Wikipedia. In Proceedings of the Third Italian Conference on Computational Linguistics, Naples, Italy.