Published November 5, 2020 | Version v1
Journal article Open

Compressive approaches for cross-language multi-document summarization

  • 1. Laboratoire Informatique d'Avignon, Avignon Université, 339 Chemin des Meinajariès, Avignon, 84140, France
  • 2. Curso de Engenharia da Computação, Universidade Federal do Ceará, Rua Coronel Estanislau Frota, 563, Sobral-Ceará, CEP 62.010-560, Brazil

Description

The popularization of social networks and digital documents has quickly increased the multilingual information available on the Internet. However, this huge amount of data cannot be analyzed manually. This paper deals with Cross-Language Text Summarization (CLTS) that produces a summary in a different language from the source documents. We describe three compressive CLTS approaches that analyze the text in the source and target languages to compute the relevance of sentences. Our systems compress sentences at two levels: clusters of similar sentences are compressed using a multi-sentence compression (MSC) method and single sentences are compressed using a Neural Network model. The version of our approach using multi-sentence compression generated more informative French-to-English cross-lingual summaries than extractive state-of-the-art systems. Moreover, these cross-lingual summaries have a grammatical quality similar to extractive approaches.

Files

Linhares_2020.pdf

Files (862.0 kB)

Name Size Download all
md5:484211c9dbb852b69bbe27b7b1dc8a00
862.0 kB Preview Download

Additional details

Funding

European Commission
EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153