Published January 8, 2019
| Version v2
Dataset
Open
SIMPITIKI corpus for simplification in Italian
Description
SIMPITIKI is a Simplification corpus for Italian and it consists of two sets of simplified pairs: the first one is harvested from the Italian Wikipedia in a semi-automatic way; the second one is manually annotated sentence-by-sentence from documents in the administrative domain.
For more details, see https://github.com/dhfbk/simpitiki
Files
simpitiki-v2.xml
Files
(911.4 kB)
Name | Size | Download all |
---|---|---|
md5:c2c00a432221250ee4fbaf1eaa7b6a6d
|
911.4 kB | Preview Download |
Additional details
Funding
References
- Sara Tonelli, Alessio Palmero Aprosio, Francesca Saltori. SIMPITIKI: a Simplification corpus for Italian extracted from Wikipedia. In Proceedings of the Third Italian Conference on Computational Linguistics, Naples, Italy.