Conference paper Open Access

Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Fernando Alva-Manchego; Joachim Bingel; Gustavo Henrique Paetzold; Carolina Scarton; Lucia Specia

Citation Style Language JSON Export

  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.1042505", 
  "language": "eng", 
  "title": "Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs", 
  "issued": {
    "date-parts": [
  "abstract": "<p>Current research in text simplification has been hampered by two central problems: (i) the small amount of high-quality parallel simplification data available, and (ii) the lack of explicit annotations of simplification operations, such as deletions or substitutions, on existing data. While the recently introduced Newsela corpus&nbsp;has alleviated the first problem, simplifications still need to be learned directly from parallel text using black-box, end-to-end approaches rather than from explicit annotations. These complex-simple parallel sentence pairs often differ to such a high degree that generalization becomes difficult. &nbsp;End-to-end models also make it hard to interpret what is actually learned from data. &nbsp;We propose a method that decomposes the task of TS into its sub-problems. We devise a way to automatically identify operations in a parallel corpus and introduce a sequence-labeling approach based on these annotations. Finally, we provide insights on the types of transformations that different approaches can model.</p>", 
  "author": [
      "family": "Fernando Alva-Manchego"
      "family": "Joachim Bingel"
      "family": "Gustavo Henrique Paetzold"
      "family": "Carolina Scarton"
      "family": "Lucia Specia"
  "type": "paper-conference", 
  "id": "1042505"
All versions This version
Views 3636
Downloads 2828
Data volume 5.0 MB5.0 MB
Unique views 3030
Unique downloads 2828


Cite as