2022-03-30 0f904a84cec8445052159e2c3d7f15f7 99+ fast 0.1 k

Hive MetaStore

1 描述

Hive MetaStore - It is a central repository that stores all the structure information of various tables and partitions in the warehouse. It also includes metadata of column and its type information, the serializers and deserializers which is used to read and write data and the corresponding HDFS files where the data is stored.

2 Hive的元数据存储(Metastore三种配置方式)

Embedded，Local，Remote

https://blog.csdn.net/epitomizelu/article/details/117091656

https://zhuanlan.zhihu.com/p/473378621

https://blog.csdn.net/qq_40990732/article/details/80914873

3 Hive元数据库介绍

https://blog.csdn.net/victorzzzz/article/details/81874674

大数据基础组件 hive hive

Hive MetaStore

2022-03-01 f0f77484ddf479d86ba02e238804669e 99+ fast 0.1 k

Hive与传统数据库对比

	Hive	传统数据库
查询语言	HQL	SQL
数据存储	HDFS	Raw Device或者 Local FS
数据格式	用户自定义	系统决定
数据更新	不支持	支持
执行	MapReduce	Excutor
执行延迟	高	低
处理数据规模	大	小
索引	0.8版本后加入位图索引	有复杂的索引
可扩展性	高	低

https://cloud.tencent.com/developer/article/1785857

大数据基础组件 hive hive

Hive与传统数据库对比

2022-02-09 dd6c73e83c6e484737870bf370ca00b1 99+ 3 m 0.5 k

hive架构

https://cwiki.apache.org/confluence/display/hive/design#Design-HiveArchitecture

https://zhuanlan.zhihu.com/p/87545980

https://blog.csdn.net/oTengYue/article/details/91129850

https://jiamaoxiang.top/2020/06/27/Hive%E7%9A%84%E6%9E%B6%E6%9E%84%E5%89%96%E6%9E%90/

https://www.javatpoint.com/hive-architecture

Hive Client

Hive allows writing applications in various languages, including Java, Python, and C++. It supports different types of clients such as:-

Thrift Server - It is a cross-language service provider platform that serves the request from all those programming languages that supports Thrift.
JDBC Driver - It is used to establish a connection between hive and Java applications. The JDBC Driver is present in the class org.apache.hadoop.hive.jdbc.HiveDriver.
ODBC Driver - It allows the applications that support the ODBC protocol to connect to Hive.

Hive Services

The following are the services provided by Hive:-

Hive CLI - The Hive CLI (Command Line Interface) is a shell where we can execute Hive queries and commands.
Hive Web User Interface - The Hive Web UI is just an alternative of Hive CLI. It provides a web-based GUI for executing Hive queries and commands.
Hive MetaStore - It is a central repository that stores all the structure information of various tables and partitions in the warehouse. It also includes metadata of column and its type information, the serializers and deserializers which is used to read and write data and the corresponding HDFS files where the data is stored.
Hive Server - It is referred to as Apache Thrift Server. It accepts the request from different clients and provides it to Hive Driver.
Hive Driver - It receives queries from different sources like web UI, CLI, Thrift, and JDBC/ODBC driver. It transfers the queries to the compiler.
Hive Compiler - The purpose of the compiler is to parse the query and perform semantic analysis on the different query blocks and expressions. It converts HiveQL statements into MapReduce jobs.
Hive Execution Engine - Optimizer generates the logical plan in the form of DAG of map-reduce tasks and HDFS tasks. In the end, the execution engine executes the incoming tasks in the order of their dependencies.

计算引擎

Hive支持MapReduce、Tez、Spark

https://cloud.tencent.com/developer/article/1893808

https://blog.csdn.net/kwu_ganymede/article/details/52223133

数据存储

https://cloud.tencent.com/developer/article/1411821

Hive是基于hdfs的，它的数据存储在Hadoop分布式文件系统中。Hive本身是没有专门的数据存储格式，也没有为数据建立索引，只需要在创建表的时候告诉Hive数据中的列分隔符和行分隔符，Hive就可以解析数据。

default数据库中的表的存储位置 /user/hive/warehouse
其他数据库的表自己指定

大数据基础组件 hive hive

hive架构

2022-01-31 c7dac0c86dff64e9b12afbd2899b5254 99+ fast 0.1 k

hive

常见问题

1.FAILED: SemanticException Failed to get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session

https://blog.csdn.net/qq_41504585/article/details/108064512