Published March 13, 2019 | Version v1
Presentation Open

Tutorial: Data-driven text simplification

  • 1. Dept. Information and Communication Technologies, Universitat Pompeu Fabra
  • 2. Symanto Research

Description

Updated version of tutorial 

Sanja Štajner, Horacio Saggion. Data-Driven Text Simplification. Proceedings of the 27th International Conference on Computational Linguistics.

presented on March 13th 2019 at the Department of Information and Communication Technologies, Universitat Pompeu Fabra, in the context of the María de Maeztu data-driven knowledge extraction strategic research program (MDM-2015-0502). https://www.upf.edu/web/mdm-dtic/tutorial-data-driven-text-simplification

In this tutorial, we aim to provide an extensive overview of automatic text simplification systems proposed so far, the methods they used and discuss the strengths and shortcomings of each of them, providing direct comparison of their outputs. We aim to break some common misconceptions about what text simplification is and what it is not, and how much it has in common with text summarisation and machine translation. We believe that deeper understanding of initial motivations, and an in-depth analysis of existing TS methods would help researchers new to ATS propose even better systems, bringing fresh ideas from other related NLP areas. We will describe and explain all the most influential methods used for automatic simplification of texts so far, with the emphasis on their strengths and weaknesses noticed in a direct comparison of systems outputs. We will present all the existing resources for TS for various languages, including parallel manually produced TS corpora, comparable automatically aligned TS corpora, paraphrase- and synonym- resources, TS-specific sentence-alignment tools, and several TS evaluation resources. Finally, we will discuss the existing evaluation methodologies for TS, and necessary conditions for using each of them. 

Notes

Tutorial presentation partially funded under the María de Maeztu Strategic Research Program on data-driven knowledge extraction at the Department of Information and Communication Technologies, UPF (MDM-2015-0502)

Files

ATS-Tutorial-UPF-Saggion_Stajner.pdf

Files (1.9 MB)

Name Size Download all
md5:fc4816a5bb1b92d9510ad5a5311feaa0
1.9 MB Preview Download

Additional details

Related works