Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Fernando Alva-Manchego; Joachim Bingel; Gustavo Henrique Paetzold; Carolina Scarton; Lucia Specia

doi:10.5281/zenodo.1042505

Published November 27, 2017 | Version v1

Conference paper Open

Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

1. University of Sheffield
2. University of Copenhagen
3. University
4. Univer

Current research in text simplification has been hampered by two central problems: (i) the small amount of high-quality parallel simplification data available, and (ii) the lack of explicit annotations of simplification operations, such as deletions or substitutions, on existing data. While the recently introduced Newsela corpus has alleviated the first problem, simplifications still need to be learned directly from parallel text using black-box, end-to-end approaches rather than from explicit annotations. These complex-simple parallel sentence pairs often differ to such a high degree that generalization becomes difficult. End-to-end models also make it hard to interpret what is actually learned from data. We propose a method that decomposes the task of TS into its sub-problems. We devise a way to automatically identify operations in a parallel corpus and introduce a sequence-labeling approach based on these annotations. Finally, we provide insights on the types of transformations that different approaches can model.

Files

ijcnlp-2017-learning-3.pdf

Files (179.3 kB)

Name	Size	Download all
ijcnlp-2017-learning-3.pdf md5:0ac00bf3c96b14162ecaf1c858441237	179.3 kB	Preview Download

Additional details

SIMPATICO – SIMplifying the interaction with Public Administration Through Information technology for Citizens and cOmpanies 692819: European Commission

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	116	115
Downloads	83	83
Data volume	14.9 MB	14.9 MB

Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Creators

Description

Files

ijcnlp-2017-learning-3.pdf

Files (179.3 kB)

Additional details

Funding