Published April 30, 2021 | Version v1
Conference paper Open

Unsupervised Approach to Cross-Lingual User Comments Summarization

  • 1. University of Ljubljana, Ljubljana, Slovenia

Description

User commenting is a valuable feature of many news outlets, enabling them a contact with readers and enabling readers to express their opinion, provide different viewpoints, and even complementary information. Yet, large volumes of user comments are hard to filter, let alone read and extract relevant information. The research on the summarization of user comments is still in its infancy, and human-created summarization datasets are scarce, especially for less-resourced languages. To address this issue, we propose an unsupervised approach to user comments summarization, which uses a modern multilingual representation of sentences together with standard extractive summarization techniques. Our comparison of different sentence representation approaches coupled with different summarization approaches shows that the most successful combinations are the same in news and comment summarization. The empirical results and presented visualisation show usefulness of the proposed methodology for several languages.

Files

2021.hackashop-1.13.pdf

Files (274.6 kB)

Name Size Download all
md5:789dc67e216ebbba1ab52c10a4d451e5
274.6 kB Preview Download

Additional details

Funding

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153
European Commission