Conference paper Open Access

Learning Simplifications for Specific Target Audiences

Carolina Scarton; Lucia Specia

MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="">
  <controlfield tag="005">20200120150631.0</controlfield>
  <controlfield tag="001">1410314</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Sheffield</subfield>
    <subfield code="a">Lucia Specia</subfield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">176715</subfield>
    <subfield code="z">md5:0112e8264562b63b88b86f612a47b55a</subfield>
    <subfield code="u"></subfield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2018-07-15</subfield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire</subfield>
    <subfield code="p">user-h2020-simpatico-692819</subfield>
    <subfield code="o"></subfield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">University of Sheffield</subfield>
    <subfield code="a">Carolina Scarton</subfield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Learning Simplifications for Specific Target Audiences</subfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-h2020-simpatico-692819</subfield>
  <datafield tag="536" ind1=" " ind2=" ">
    <subfield code="c">692819</subfield>
    <subfield code="a">SIMplifying the interaction with Public Administration Through Information technology for Citizens and cOmpanies</subfield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u"></subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2"></subfield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;Text simplification (TS) is a monolingual text-to-text transformation task where an original (complex) text is transformed into a target (simpler) text. Most recent work is based on sequence-to-sequence neural models similar to those used for machine translation (MT). Different from MT, TS data comprises more elaborate transformations, such as sentence splitting. It can also contain multiple simplifications of the same original text targeting different audiences, such as school grade levels. We explore these two features of TS to build models tailored for specific grade levels. Our approach uses a standard sequence-to-sequence architecture where the original sequence is annotated with information about the target audience and/or the (predicted) type of simplification operation. We show that it outperforms state-of-the-art TS approaches (up to 3 and 12&amp;nbsp; BLEU and SARI points, respectively), including when training data for the specific complex-simple combination of grade levels is not available, i.e. zero-shot learning.&lt;/p&gt;</subfield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.1410313</subfield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.1410314</subfield>
    <subfield code="2">doi</subfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">publication</subfield>
    <subfield code="b">conferencepaper</subfield>
All versions This version
Views 6666
Downloads 5555
Data volume 9.7 MB9.7 MB
Unique views 6363
Unique downloads 5353


Cite as