Sunday, February 25, 2024

sample

 



What is Hive?

--It is an open source data warehouse tool to process and query data on hdfs

-->Hive always reaches out to hdfs if it requires any data

-->hdfs is the storage unit and map-reduce is processing unit for hive



Why Hive?

--We know that we need map reduce to process data on hdfs and give us result data

--But map reduce is quite difficult to write and very monotonous

--Due to this reason we came up with Hive

--Hive queries are written using HQL(hive query language) which convert into map reduce code and get the result data from hdfs in a structured format (tables).



What data is stored in hive?

--Structured data (from hdfs)

--Metadata of tables (schema) from rdbms



Why metadata is not stored in hdfs?

--Data in hdfs can't be edited/changed so we can't metadata here for any updations

--Data in hdfs is difficult to retrieve quickly (low latency)



Why hive when we have RDBMS?

--Hive runs on distributed systems and queries are converted into map reduce

--RDBMS runs on single system and parallelism is not present



Do we need to write map-reduce code to retrieve data from hdfs?

--NO, we write queries in HQL which are converted internally into map reduce tasks which actually process the data and give result sets to hive.


Post a Comment

Whatsapp Button works on Mobile Device only

Start typing and press Enter to search