Hive Big Data Table Performance Tuning Techniques
Authors/Creators
Description
Hive table is one of the big data tables which relies on structural data. By default, it stores the data in a hive warehouse. To store at a specific location, the developer can set the location using a location tag during the table creation. Hive follows the same SQL concepts like row, columns, and schema. Developers working on big data applications have a prevalent problem when reading Hadoop file systems data or hive table data. The data is written in Hadoop clusters using spark streaming or Nifi streaming jobs or any streaming or ingestion application. When these apps write data in the Hadoop cluster in the Hadoop file system and known as Hdfs or Hive tables, a lot of small data files are written in the Hadoop Cluster. These part files are written across different data nodes, and when the number of files increases in the directory, it becomes tedious and a performance bottleneck while some other app or user tries to read this data. One of the reasons is that data is distributed across nodes. Think about your data residing in multiple distributed nodes. The more scattered it is, the job takes around “N * (Number of files)” time to read data, where N is the number of nodes across each Name Nodes. For example, if there are 1 million files, when we run the MapReduce job, the mapper has to run for 1 million files across data nodes and this will lead to full cluster utilization leading to performance issues. This article explains the process to improve the performance of the hive big data table
Files
Files
(202.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:0b792ff2707f6d57ab00b56bc7d112ef
|
202.7 kB | Download |
Additional details
References
- Apache. (n.d.). Hive Apache. Hive Apache. https://hive.apache.org/
- Gauthier, G. L. (2019, July 25). Running Apache Hive 3, new features and tips and tricks. www.Adaltas.Com. https://www.adaltas.com/en/2019/07/25/hive-3-features-tips-tricks/
- Koloth, K. S. (2020, October 15). Importance of Big Data on Artificial Intelligence. https://londondailypost.com. https://londondailypost.com/sudhish-koloth-the-importance-of-big-data-on-artificial-intelligence/