Published May 3, 2018 | Version v1
Journal article Open

DOCUMENT SUMMARIZATION USING SENTENCE BASED TOPIC MODELING AND CLUSTERING

  • 1. Research Scholar, Bharathiyar University.
  • 2. Professor, Bangalore University.

Description

In recent years, the practical application of automatic document summarization has become popular and numerous papers published based on the topic. There are many approaches to identify the significant portion of each document. Topic representation and modelling is an intermediate representation of the text that captures the topics discussed in the input and aids the automatic summarization. The significance of sentences decided based on the representations of topics in the input document. This article attempts to provide a comprehensive summary that includes sentence extraction, tokenization on the extracted sentences. Sentence based Structural Topic Modeling (STM) is used to determine important content for each domain in the integrated document and sentences are grouped using k-means clustering under each topic. Further Text Summarization of sentences under each topic achieved using its Term Frequency of each sentence. Finally, the sentences are arranged based on its Lexical Ranking score in the summarized text.

Files

29.pdf

Files (341.0 kB)

Name Size Download all
md5:fb926621d4e8673ba1cf96701569be79
341.0 kB Preview Download