Analyzing semantic similarity amongst textual documents to suggest near duplicates

Devarajan, Viji; Subramanian, Revathy

doi:10.11591/ijeecs.v25.i3.pp1703-1711

Published March 1, 2022 | Version v1

Journal article Open

Analyzing semantic similarity amongst textual documents to suggest near duplicates

1. Department of Computer Science and Engineering, Faculty of Engineering and Technology, Sathyabama Institute of Science and Technology, Chennai, India
2. Department of Information Technology, Faculty of Engineering and Technology, Sathyabama Institute of Science and Technology, Chennai, India

Data deduplication techniques removing repeated or redundant data from the storage. In recent days, more data has been generated and stored in the storage environment. More redundant and semantically similar content of the data occupied in the storage environment due to this storage efficiency will be reduced and cost of the storage will be high. To overcome this problem, we proposed a method hybrid bidirectional encoder representation from transformers for text semantics using graph convolutional network hybrid bidirectional encoder representation from transformers (BERT) model for text semantics (HBTSG) word embedding-based deep learning model to identify near duplicates based on the semantic relationship between text documents. In this paper we hybridize the concepts of chunking and semantic analysis. The chunking process is carried out to split the documents into blocks. Next stage we identify the semantic relationship between documents using word embedding techniques. It combines the advantages of the chunking, feature extraction, and semantic relations to provide better results.

Files

51 27058 v25i3 Mar22.pdf

Files (492.3 kB)

Name	Size	Download all
51 27058 v25i3 Mar22.pdf md5:a442856f31c26273e4b3ba3d8db4fa01	492.3 kB	Preview Download

	All versions	This version
Views	29	28
Downloads	57	56
Data volume	28.6 MB	28.1 MB

Analyzing semantic similarity amongst textual documents to suggest near duplicates

Authors/Creators

Description

Files

51 27058 v25i3 Mar22.pdf

Files (492.3 kB)