Marathi Text Analysis using Unsupervised Learning and Word Cloud

doi:10.35940/ijeat.C4727.029320

Published February 29, 2020 | Version v1

Journal article Open

Marathi Text Analysis using Unsupervised Learning and Word Cloud

1. Symbiosis Institute of Computer Studies and Research, Symbiosis International (Deemed) University, India, Maharashtra, Pune. Jat

Sponsor:

Blue Eyes Intelligence Engineering & Sciences Publication (BEIESP)¹

1. Publisher

Managing a large number of textual documents is a critical and significant task and supports many applications ranging from information retrieval to clustering search engine results. Marathi is one of the oldest of the regional languages in the Indo-Aryan language family, dating from about AD 1000. Abundance of Marathi literature has generated a big corpus and need of summarization of information. The objective of this study is to overcome the scalability problem while managing the documents and summarize the Marathi corpus by extracting tokens. The work is better in terms of scalability and supports the consistent quality of cluster for incremental data set. Most of the past and contemporary research works have targeted English corpus document management. Marathi corpus has been mostly exploited by the researchers for exploring stemming, single-document summarization and classifier design on Marathi corpus. Implementing unsupervised learning on the Marathi corpus for summarization of multiple documents through Word Cloud is still an untouched area. Technically speaking, the current work is an application of TF-IDF, cosine-based document similarity measures and cluster dendrograms, in addition to various other Natural Language Processing (NLP) activities. Entropy and precision are used to evaluate the experiments carried on different datasets and results prove the robustness of the proposed approach for Marathi Corpus.

Files

C4727029320 (1).pdf

Files (458.5 kB)

Name	Size	Download all
C4727029320 (1).pdf md5:7a363391aecd009df1b6f678b18eb43a	458.5 kB	Preview Download

Additional details

Is cited by: Journal article: 2249-8958 (ISSN)

ISSN: 2249-8958

	All versions	This version
Views	98	96
Downloads	62	61
Data volume	30.5 MB	30.0 MB

Marathi Text Analysis using Unsupervised Learning and Word Cloud

Sponsor:

Files

C4727029320 (1).pdf

Files (458.5 kB)

Additional details

Related works

Subjects

Marathi Text Analysis using Unsupervised Learning and Word Cloud

Creators

Contributors

Sponsor:

Description

Files

C4727029320 (1).pdf

Files (458.5 kB)

Additional details

Related works

Subjects