A Benchmark for Suitability of Alluxio over Spark

Kanchana Rajaram; Kavietha Haridass

doi:10.35940/ijitee.A8190.1110120

Published November 30, 2020 | Version v1

Journal article Open

A Benchmark for Suitability of Alluxio over Spark

1. Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering, Chennai, India. E

Contributors

Sponsor:

Blue Eyes Intelligence Engineering and Sciences Publication(BEIESP)¹

1. Publisher

Big data applications play an important role in real time data processing. Apache Spark is a data processing framework with in-memory data engine that quickly processes large data sets. It can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools. Spark’s in-memory processing cannot share data between the applications and hence, the RAM memory will be insufficient for storing petabytes of data. Alluxio is a virtual distributed storage system that leverages memory for data storage and provides faster access to data in different storage systems. Alluxio helps to speed up data intensive Spark applications, with various storage systems. In this work, the performance of applications on Spark as well as Spark running over Alluxio have been studied with respect to several storage formats such as Parquet, ORC, CSV, and JSON; and four types of queries from Star Schema Benchmark (SSB). A benchmark is evolved to suggest the suitability of Spark Alluxio combination for big data applications. It is found that Alluxio is suitable for applications that use databases of size more than 2.6 GB storing data in JSON and CSV formats. Spark is found suitable for applications that use storage formats such as parquet and ORC with database sizes less than 2.6GB.

Files

A81901110120.pdf

Files (446.2 kB)

Name	Size	Download all
A81901110120.pdf md5:6b933f41b859941b4a2165b3cf8b6cdc	446.2 kB	Preview Download

Additional details

Is cited by: Journal article: 2278-3075 (ISSN)

ISSN: 2278-3075
Retrieval Number: 100.1/ijitee.A81901110120

	All versions	This version
Views	163	163
Downloads	96	96
Data volume	42.8 MB	42.8 MB

Contributors

Sponsor:

A81901110120.pdf

Files (446.2 kB)

Related works

Subjects

A Benchmark for Suitability of Alluxio over Spark

Authors/Creators

Contributors

Sponsor:

Description

Files

A81901110120.pdf

Files (446.2 kB)

Additional details

Related works

Subjects