Report Open Access

PERFORMANCE STUDY OF PARQUET CODECs

Javier García Rubio

Apache Parquet is a columnar storage format for the Hadoop ecosystem. The technology has become 
almost a de-factor standard due to important benefits and adventages such its seameless integraction 
with multiple choices of data processing framework, data model or programming language. 
Another important aspect of this technology is the capacity to redure drastically the amount of storage 
required to persist data. That reduction is achieved based on the columnar format used but also to the 
compression codecs supported and applied to the data once it is transformed into parquet format. 
This project studies the implications in terms of performance on data compression and access of the 
differents codecs available.

Files (624.5 kB)
Name Size
Report_Javier_Garcia_Rubio.pdf
md5:52f4034c49c608ec56ba7888960c7ac2
624.5 kB Download
97
557
views
downloads
All versions This version
Views 9797
Downloads 557557
Data volume 347.8 MB347.8 MB
Unique views 9393
Unique downloads 540540

Share

Cite as