Conference paper Open Access

SIMPITIKI: A Simplification Corpus for Italian

Tonelli, Sara; Palmero Aprosio, Alessio; Saltori, Francesca


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.2534132", 
  "title": "SIMPITIKI: A Simplification Corpus for Italian", 
  "issued": {
    "date-parts": [
      [
        2019, 
        1, 
        8
      ]
    ]
  }, 
  "abstract": "<p>In this work, we analyse whether&nbsp;Wikipedia can be used to leverage simplification pairs instead of Simple Wikipedia,&nbsp;which has proved unreliable for assessing automatic simplification systems, and&nbsp;is available only in English. We focus&nbsp;on sentence pairs in which the target sentence is the outcome of a Wikipedia edit&nbsp;marked as &lsquo;simplified&rsquo;, and manually annotate simplification phenomena following an existing scheme proposed for previous simplification corpora in Italian.&nbsp;The outcome of this work is the SIMPITIKI corpus, which we make freely available, with pairs of sentences extracted&nbsp;from Wikipedia edits and annotated with&nbsp;simplification types. The resource contains also another corpus with roughly&nbsp;the same number of simplifications, which&nbsp;was manually created by simplifying documents in the administrative domain</p>", 
  "author": [
    {
      "family": "Tonelli, Sara"
    }, 
    {
      "family": "Palmero Aprosio, Alessio"
    }, 
    {
      "family": "Saltori, Francesca"
    }
  ], 
  "id": "2534132", 
  "event-place": "Naples, Italy", 
  "version": "2.", 
  "type": "paper-conference", 
  "event": "Third Italian Conference on Computational Linguistics (CLIC-it)"
}
77
77
views
downloads
All versions This version
Views 7777
Downloads 7777
Data volume 27.1 MB27.1 MB
Unique views 7171
Unique downloads 7070

Share

Cite as