An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark

Raed A. Hasan; Royida A. Ibrahem Alhayali; Nashwan Dheyaa Zaki3Nashwan Dheyaa Zaki; Ahmed Hussien Ali

doi:10.12928/TELKOMNIKA.v17i6.11711

Published December 1, 2019 | Version v1

Journal article Open

An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark

1. Noerthern Technical University
2. University of Diyala, Diyala
3. University of Information Technology and Communications
4. AL Salam University

On-going big data from social networks sites alike Twitter or Facebook has been an entrancing hotspot for investigation by researchers in current decades as a result of various aspects including up-to-date-ness, accessibility and popularity; however anyway there may be a trade off in accuracy. Moreover, clustering of twitter data has caught the attention of researchers. As such, an algorithm which can cluster data within a lesser computational time, especially for data streaming is needed. The presented adaptive clustering and classification algorithm is used for data streaming in Apache spark to overcome the existing problems is processed in two phases. In the first phase, the input pre-processed twitter data is viably clustered utilizing an Improved Fuzzy C-means clustering and the proposed clustering is additionally improved by an Adaptive Particle swarm optimization (PSO) algorithm. Further the clustered data streaming is assessed utilizing spark engine. In the second phase, the input pre-processed Higgs data is classified utilizing the modified support vector machine (MSVM) classifier with grid search optimization. At long last the optimized information is assessed in spark engine and the assessed esteem is utilized to discover an accomplished confusion matrix. The proposed work is utilizing Twitter dataset and Higgs dataset for the data streaming in Apache Spark. The computational examinations exhibit the superiority of presented approach comparing with the existing methods in terms of precision, recall, F-score, convergence, ROC curve and accuracy.

Files

46 11711.pdf

Files (910.9 kB)

Name	Size	Download all
46 11711.pdf md5:1c730ba52747a07b21a2dfeb7c7a6520	910.9 kB	Preview Download

	All versions	This version
Views	59	59
Downloads	85	85
Data volume	77.4 MB	77.4 MB

An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark

Authors/Creators

Description

Files

46 11711.pdf

Files (910.9 kB)