Comparative Study of Spark MLlib vs. TensorFlow on Distributed Big Data Sets

Priyambada Swain

doi:10.5281/zenodo.16437469

Published July 26, 2025 | Version v1

Journal article Open

Comparative Study of Spark MLlib vs. TensorFlow on Distributed Big Data Sets

Priyambada Swain (Researcher)

ABSTRACT

Big data analytics typically involves analyses where the volume of data exceeds the computational resources of a single machine. Machine learning (ML) often plays a big role at the core of the analytical pipeline, but existing ML techniques do not scale well with dataset sizes, and many ML implementations are not compatible with analytics systems like Hadoop or Spark. AOps and C-Systems offer big data analytics solutions built using Apache Spark to compute task relatedness and perform classification; nevertheless, Spark does not come with ML algorithms for analytics. Several big data analytics and management solutions also rely on rule-based or domain knowledge-based heuristics rather than statistical or ML methods.

Keywords: Spark Mllib, TensorFlow, big data, Machine learning (ML)

Files

13.pdf

Files (301.0 kB)

Name	Size	Download all
13.pdf md5:9f82bae7f80b654068e54cf2594b3fb7	301.0 kB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	14	14
Downloads	15	15
Data volume	6.3 MB	6.3 MB

More info on how stats are collected....

DOI

Resource type

Journal article

Publisher

RSPUBLICATION

Published in

International Journal of Computer Application, 15(4), 145-167, ISSN: 2250-1797, 2025.

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more
Copyright: RS PUBLICATION

Technical metadata

Created: July 26, 2025
Modified: July 26, 2025

Comparative Study of Spark MLlib vs. TensorFlow on Distributed Big Data Sets

Creators

Description

Files

13.pdf

Files (301.0 kB)