Published July 26, 2025 | Version v1
Journal article Open

Comparative Study of Spark MLlib vs. TensorFlow on Distributed Big Data Sets

Description

ABSTRACT

 

Big data analytics typically involves analyses where the volume of data exceeds the computational resources of a single machine. Machine learning (ML) often plays a big role at the core of the analytical pipeline, but existing ML techniques do not scale well with dataset sizes, and many ML implementations are not compatible with analytics systems like Hadoop or Spark. AOps and C-Systems offer big data analytics solutions built using Apache Spark to compute task relatedness and perform classification; nevertheless, Spark does not come with ML algorithms for analytics. Several big data analytics and management solutions also rely on rule-based or domain knowledge-based heuristics rather than statistical or ML methods.

 

Keywords: Spark Mllib, TensorFlow, big data, Machine learning (ML)

 

Files

13.pdf

Files (301.0 kB)

Name Size Download all
md5:9f82bae7f80b654068e54cf2594b3fb7
301.0 kB Preview Download