Published March 17, 2020 | Version v1
Journal article Open

Gender and Authorship Categorisation of Arabic Text from Twitter Using PPM

Authors/Creators

  • 1. University of Hail

Description

In this paper we present gender and authorship categorisationusing the Prediction by Partial Matching (PPM) compression scheme for text from Twitter written in Arabic. The PPMD variant of the compression scheme with different orders was used to perform the categorisation. We also applied different machine learning algorithms such as Multinational Naïve Bayes (MNB), K-Nearest Neighbours (KNN), and an implementation of Support Vector Machine (LIBSVM), applying the same processing steps for all the algorithms. PPMD shows significantly better accuracy in comparison to all the other machine learning algorithms, with order 11 PPMD working best, achieving 90 % and 96% accuracy for gender and authorship respectively.

Files

9217ijcsit12.pdf

Files (459.2 kB)

Name Size Download all
md5:51a898f5822b8bc0702adc0d79888d63
459.2 kB Preview Download