Text pre-processing of multilingual for sentiment analysis based on social network data

Neha Garg; Kamlesh Sharma

doi:10.11591/ijece.v12i1.pp776-784

Published February 1, 2022 | Version v1

Journal article Open

Text pre-processing of multilingual for sentiment analysis based on social network data

1. Manav Rachna International Institute of Research and Studies

Sentiment analysis (SA) is an enduring area for research especially in the field of text analysis. Text pre-processing is an important aspect to perform SA accurately. This paper presents a text processing model for SA, using natural language processing techniques for twitter data. The basic phases for machine learning are text collection, text cleaning, pre-processing, feature extractions in a text and then categorize the data according to the SA techniques. Keeping the focus on twitter data, the data is extracted in domain specific manner. In data cleaning phase, noisy data, missing data, punctuation, tags and emoticons have been considered. For pre-processing, tokenization is performed which is followed by stop word removal (SWR). The proposed article provides an insight of the techniques, that are used for text pre-processing, the impact of their presence on the dataset. The accuracy of classification techniques has been improved after applying text preprocessing and dimensionality has been reduced. The proposed corpus can be utilized in the area of market analysis, customer behaviour, polling analysis, and brand monitoring. The text pre-processing process can serve as the baseline to apply predictive analysis, machine learning and deep learning algorithms which can be extended according to problem definition.

Files

81 25456 EMr 15Jul 25Mar NK.pdf

Files (514.9 kB)

Name	Size	Download all
81 25456 EMr 15Jul 25Mar NK.pdf md5:eff3d52241481d780b38f7dc77e781a6	514.9 kB	Preview Download

	All versions	This version
Views	79	79
Downloads	117	117
Data volume	61.8 MB	61.8 MB

Text pre-processing of multilingual for sentiment analysis based on social network data

Authors/Creators

Description

Files

81 25456 EMr 15Jul 25Mar NK.pdf

Files (514.9 kB)