Unsupervised Text Segmentation via Deep Sentence Encoders: a first step towards a common framework for text-based segmentation, summarization and indexing of media content.
Description
In this paper we present a new algorithm for text segmentation based on deep sentence encoders and the TextTiling algorithm. We will describe how text segmentation is an essential first step in the re-purposing of media content like TV newscasts and how the proposed methodology can add value to other subsequent tasks involving such media products thanks to the features extracted for segmentation. We present experiments on Wikipedia and transcripts from CNN 10 news show and the results of the proposed algorithm will be compared to other approaches. Our method shows improvement over other unsupervised methods and it gives results that are competitive with supervised approaches without the need for any training data. Finally, we will give examples of how to re-purpose the encoded sentences, so to highlight the re-usability of the extracted sentence embeddings for tasks like automatic summarization, while showing how these tasks depend on the segmentation process.
Notes
Files
UnsupervisedTextSegmentationFinalVersion.pdf
Files
(1.0 MB)
Name | Size | Download all |
---|---|---|
md5:33d90dbe2758a63016f8f5efc47df306
|
1.0 MB | Preview Download |