PROCESSING ANALYSIS ON HIVE USING USER DEFINED FUNCTIONS (UDF) AND HIVE QUERY LANGUAGE (HQL)
Creators
- 1. Guest Lecturer, Department of Computer Science, Govt. Arts and Science College – Thiruvadanai
Description
Apache Hadoop is an open source framework that deals with the distributed computing of large datasets across a clustered of computers using low-cost hardware and simple programming models. Hadoop uses HDFS (Hybrid Distributed File System) for data storage. This allows for distributed processing of large sets across clustered of computers using simple programming languages. Hive is a famous data warehouse package built on top of Hadoop. It provides an SQL tongue called Hive Query Language (HQL) for querying and processing of large sets of data. But the problem with HQL is that it takes more time when queries are applied to convert data in rows to columns i.e horizontal to vertical. This limitation of HQL is improved by using the User Defined Functions (UDF) in Hive. With the help of UDF, many calculations that are outside the scope of built-in HQL operations and functions in Hive like query many columns, combine several column values into one and transformations that are taking more time in HQL, can be solved easily. In the proposed research work five different data sets are taken from web for experimental results. Simple queries using UDF and HQL are applied on these data sets. The results obtained on these data sets show that UDF is better than HQL.
Files
Files
(109.8 kB)
Name | Size | Download all |
---|---|---|
md5:383479d8fe028add8f9a6b1ca3b733a8
|
109.8 kB | Download |