Published February 29, 2020 | Version v1
Journal article Open

SENTIMENT ANALYSIS FOR ARABIC TWEETS DATASETS: LEXICON-BASED AND MACHINE LEARNING APPROACHES

  • 1. Prince Al Hussein Bin Abdullah II faculty for Information Technology, Hashemite University, P.O. Box 150459, Zarqa 13115, Jordan
  • 2. King Abdullah II School of Information Technology, The University of Jordan, P.O Box 11942, Amman, Jordan
  • 3. Deanship of preparatory year and supporting studies, Imam Abdulrahman Bin Faisal University, P.O Box 1982, Dammam, Saudi Arabia

Description

Recently, Sentiment Analysis applied to social media data has gradually become one of the significant
research interest in the data mining domain due to the large volume of data available on social media
networks. Sentiment Analysis is concerned with analyzing text to identify opinions or emotions and
categorizing them as positive, negative or neutral. Applying sentiment analysis to short texts such as Twitter
messages is a challenging task because tweets might contain a combination of formal and informal language,
special characters, emojis and symbols. Therefore, it is often difficult to understand the semantics of the text
and it is complex to extract the proper emotions expressed by users.
In this paper, sentiment analysis approaches, namely: lexicon-based and machine learning approaches, are
applied and evaluated on an Arabic tweets dataset (short texts) regarding the Syrian civil war and crises. The
experimental results revealed that machine learning approaches outperformed the lexicon-based in the
context of predicting the subjectivity of tweets. In terms of machine learning, five popular machine learning
algorithms were applied and evaluated. According to the experimental results, the Logistic Model Trees
(LMT) algorithm achieved the highest performance results, followed by the simple logistic and the SVM
algorithms, respectively. The results also showed that there are enhancements in performance when utilizing
feature selection approaches. Based on all performance evaluation measures, the LMT algorithms reported
the best performance results (Acc= 85.55, F1= 0.92 and AUC= 0.86).

Files

7Vol98No4.pdf

Files (404.7 kB)

Name Size Download all
md5:73a5a870cad9671cfbbc21c851759495
404.7 kB Preview Download