Published November 22, 2019 | Version v1
Report Open

PERFORMANCE STUDY OF PARQUET CODECs

Description

Apache Parquet is a columnar storage format for the Hadoop ecosystem. The technology has become 
almost a de-factor standard due to important benefits and adventages such its seameless integraction 
with multiple choices of data processing framework, data model or programming language. 
Another important aspect of this technology is the capacity to redure drastically the amount of storage 
required to persist data. That reduction is achieved based on the columnar format used but also to the 
compression codecs supported and applied to the data once it is transformed into parquet format. 
This project studies the implications in terms of performance on data compression and access of the 
differents codecs available.

Files

Report_Javier_Garcia_Rubio.pdf

Files (624.5 kB)

Name Size Download all
md5:52f4034c49c608ec56ba7888960c7ac2
624.5 kB Preview Download