Conference paper Open Access

# Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Fernando Alva-Manchego; Joachim Bingel; Gustavo Henrique Paetzold; Carolina Scarton; Lucia Specia

### Citation Style Language JSON Export

{
"publisher": "Zenodo",
"DOI": "10.5281/zenodo.1042505",
"language": "eng",
"title": "Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs",
"issued": {
"date-parts": [
[
2017,
11,
27
]
]
},
"abstract": "<p>Current research in text simplification has been hampered by two central problems: (i) the small amount of high-quality parallel simplification data available, and (ii) the lack of explicit annotations of simplification operations, such as deletions or substitutions, on existing data. While the recently introduced Newsela corpus&nbsp;has alleviated the first problem, simplifications still need to be learned directly from parallel text using black-box, end-to-end approaches rather than from explicit annotations. These complex-simple parallel sentence pairs often differ to such a high degree that generalization becomes difficult. &nbsp;End-to-end models also make it hard to interpret what is actually learned from data. &nbsp;We propose a method that decomposes the task of TS into its sub-problems. We devise a way to automatically identify operations in a parallel corpus and introduce a sequence-labeling approach based on these annotations. Finally, we provide insights on the types of transformations that different approaches can model.</p>",
"author": [
{
"family": "Fernando Alva-Manchego"
},
{
"family": "Joachim Bingel"
},
{
"family": "Gustavo Henrique Paetzold"
},
{
"family": "Carolina Scarton"
},
{
"family": "Lucia Specia"
}
],
"type": "paper-conference",
"id": "1042505"
}
36
28
views