Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published May 9, 2021 | Version v1
Conference paper Open

Unsupervised Text Segmentation via Deep Sentence Encoders: a first step towards a common framework for text-based segmentation, summarization and indexing of media content.

  • 1. Queen Mary University of London

Description

In this paper we present a new algorithm for text segmentation based on deep sentence encoders and the TextTiling algorithm. We will describe how text segmentation is an essential first step in the re-purposing of media content like TV newscasts and how the proposed methodology can add value to other subsequent tasks involving such media products thanks to the features extracted for segmentation. We present experiments on Wikipedia and transcripts from CNN 10 news show and the results of the proposed algorithm will be compared to other approaches. Our method shows improvement over other unsupervised methods and it gives results that are competitive with supervised approaches without the need for any training data. Finally, we will give examples of how to re-purpose the encoded sentences, so to highlight the re-usability of the extracted sentence embeddings for tasks like automatic summarization, while showing how these tasks depend on the segmentation process.

Notes

Accompanying code available at https://github.com/Ighina/DeepTiling

Files

UnsupervisedTextSegmentationFinalVersion.pdf

Files (1.0 MB)

Name Size Download all
md5:33d90dbe2758a63016f8f5efc47df306
1.0 MB Preview Download