sample

What is Hive?

--It is an open source data warehouse tool to process and query data on hdfs

-->Hive always reaches out to hdfs if it requires any data

-->hdfs is the storage unit and map-reduce is processing unit for hive

Why Hive?

--We know that we need map reduce to process data on hdfs and give us result data

--But map reduce is quite difficult to write and very monotonous

--Due to this reason we came up with Hive

--Hive queries are written using HQL(hive query language) which convert into map reduce code and get the result data from hdfs in a structured format (tables).

What data is stored in hive?

--Structured data (from hdfs)

--Metadata of tables (schema) from rdbms

Why metadata is not stored in hdfs?

--Data in hdfs can't be edited/changed so we can't metadata here for any updations

--Data in hdfs is difficult to retrieve quickly (low latency)

Why hive when we have RDBMS?

--Hive runs on distributed systems and queries are converted into map reduce

--RDBMS runs on single system and parallelism is not present

Do we need to write map-reduce code to retrieve data from hdfs?

--NO, we write queries in HQL which are converted internally into map reduce tasks which actually process the data and give result sets to hive.

Sunday, February 25, 2024

sample

Post a Comment

about me

Templatezy

Popular Posts

Blog Archive

Pages

Sunday, February 25, 2024

sample

Post a Comment

about me

Templatezy

Popular Posts

Blog Archive

Pages

get more nice stuff in your inbox

get more nice stuff
in your inbox