PROCESSING ANALYSIS ON HIVE USING USER DEFINED FUNCTIONS (UDF) AND HIVE QUERY LANGUAGE (HQL)

N.RAMAMOORTHY

doi:10.5281/zenodo.3174717

Published May 22, 2019 | Version v1

Journal article Open

PROCESSING ANALYSIS ON HIVE USING USER DEFINED FUNCTIONS (UDF) AND HIVE QUERY LANGUAGE (HQL)

N.RAMAMOORTHY¹

1. Guest Lecturer, Department of Computer Science, Govt. Arts and Science College – Thiruvadanai

Apache Hadoop is an open source framework that deals with the distributed computing of large datasets across a clustered of computers using low-cost hardware and simple programming models. Hadoop uses HDFS (Hybrid Distributed File System) for data storage. This allows for distributed processing of large sets across clustered of computers using simple programming languages. Hive is a famous data warehouse package built on top of Hadoop. It provides an SQL tongue called Hive Query Language (HQL) for querying and processing of large sets of data. But the problem with HQL is that it takes more time when queries are applied to convert data in rows to columns i.e horizontal to vertical. This limitation of HQL is improved by using the User Defined Functions (UDF) in Hive. With the help of UDF, many calculations that are outside the scope of built-in HQL operations and functions in Hive like query many columns, combine several column values into one and transformations that are taking more time in HQL, can be solved easily. In the proposed research work five different data sets are taken from web for experimental results. Simple queries using UDF and HQL are applied on these data sets. The results obtained on these data sets show that UDF is better than HQL.

Files

Files (109.8 kB)

Name	Size	Download all
37.docx md5:383479d8fe028add8f9a6b1ca3b733a8	109.8 kB	Download

	All versions	This version
Views	88	88
Downloads	21	21
Data volume	2.3 MB	2.3 MB

PROCESSING ANALYSIS ON HIVE USING USER DEFINED FUNCTIONS (UDF) AND HIVE QUERY LANGUAGE (HQL)

Creators

Description

Files

Files (109.8 kB)