Published October 4, 2021 | Version v1
Conference paper Open

Benchmarks for Unsupervised Discourse Change Detection

  • 1. University of Helsinki, Helsinki, Finland

Description

The main motivation for this work lies in the need to track discourse dynamics in historical corpora.
However, in many real use cases ground truth is not available and annotating discourses on a corpus-level
is hardly possible. We propose a novel procedure to generate synthetic datasets for this task, a novel
evaluation framework and a set of benchmarking models. Finally, we run large-scale experiments using
these synthetic datasets and demonstrate that a model trained on such a dataset can obtain meaningful
results when applied to a real dataset, without any adjustments of the model.

Files

paper5.pdf

Files (720.9 kB)

Name Size Download all
md5:54cc7d9f6ebb4a3a73ee62841bd42663
720.9 kB Preview Download

Additional details

Funding

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153
European Commission