红丝带网

Hive简单安装入门

Hive是一个数据仓库基础工具在Hadoop中用来处理结构化数据。它架构在Hadoop之上,总归为大数据,并使得查询和分析方便。并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。
如果没有Hive:使用者->mapreduce->hadoop数据(使用者需要会复杂的mapreduce)
有hive:使用者->HQL(SQL)->hive->mapreduce->hadoop数据(只需要会SQL语句)

下面介绍hive的安装及配置过程

Hive安装前需要安装JDK和Hadoop。使用mysql来存储元数据,则需要安装mysql。

启动好Hadoop,可以参考我的文章《Hadoop入门:设置单节点集群。》

1.下载安装包,推荐大家使用清华大学免费软件镜像下载地址,速度贼快:

https://mirrors.tuna.tsinghua.edu.cn/apache/hive/


我就下载了最新版hive-3.1.3,Linux直接用命令下载方便点:

wget http://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz

2.解压

tar zxvf apache-hive-3.1.3-bin.tar.gz

3.修改全局环境变量/etc/profile或者配置某个用户的环境变量因配置全局环境变量没有权限,下面展示配置用户环境变量

第一步:进入用户环境变量文件:vim ~/.bashrc

末尾追加内容:

export HIVE_HOME=/apache-hive-3.1.3-bin

export PATH=$PATH:$HIVE_HOME/bin

第二步:输入 source ~/.bashrc 使环境变量生效

第三步:输入export命令,查看新增的环境变量

4.修改hive配置文件

进入配置文件所在目录,然后添加Hive核心配置,命名为hive-site.xml

cd apache-hive-3.1.3-bin/conf
vim hive-site.xml

主要配置MySQL连接信息,用来存元数据:

<configuration>
        <property>
                <name>javax.jdo.option.ConnectionURL</name>
                <value>jdbc:mysql://localhost:3306/test?createDatabaseIfNotExist=true</value>
                <description>JDBC connect string for a JDBC metastore</description>
                <!-- 如果 mysql 和 hive 不在同一个服务器节点,那么请更改 localhost 为 对应ip -->
        </property>
        <property>
                <name>javax.jdo.option.ConnectionDriverName</name>
                <value>com.mysql.jdbc.Driver</value>
                <description>Driver class name for a JDBC metastore</description>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionUserName</name>
                <value>root</value>
                <description>username to use against metastore database</description>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionPassword</name>
                <value>root</value>
        				<description>password to use against metastore database</description>
        </property>
</configuration>

以下可选配置,指定 Hive 数据仓库的数据存储在 HDFS 上的目录

        <property>
                <name>hive.metastore.warehouse.dir</name>
                <value>/hive/warehouse</value>
                <description>hive default warehouse, if nessecory, change it</description>
        </property>    

接下来配置hive-env.sh

cp hive-env.sh.template hive-env.sh

vim hive-env.sh

# hadoop的安装目录

HADOOP_HOME=/home/hadoop-3.3.4

5. hive 的根路径下的 lib 目录加入 MySQL 驱动包

1.下载MySQL驱动包

wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/mysql/downloads/Connector-J/mysql-connector-java-5.1.49.tar.gz

2.解压

tar zxvf mysql-connector-java-5.1.49.tar.gz

3.复制MySQL的驱动包到hive的lib文件夹下

cp ~/mysql-connector-java-5.1.49/mysql-connector-java-5.1.49-bin.jar /home/apache-hive-3.1.3-bin/lib

6.初始化元数据(注意:在此之前需要创建mysql下的hive数据库

schematool -dbType mysql -initSchema


7.使用hive

在bin目录下执行

./hive

基本使用

1.创建一个数据库dw

hive> create database dw;

OK

Time taken: 0.131 seconds

创建成功后可以在hadoop的web页面看到:


2.使用新的数据库dw

hive> use dw;

OK

Time taken: 0.03 seconds

3.查看当前正在使用的数据库

hive> select current_database();

OK

dw

Time taken: 0.184 seconds, Fetched: 1 row(s)

4.建表

hive> create table `erp_user`(id int, login_name string);

OK

Time taken: 0.409 seconds

5.插入数据

hive> insert into erp_user values(1, 'test');

Query ID = tomtop2149_20230201110137_eff737fe-12d8-4de8-a273-6212aaa3210d

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Starting Job = job_1675147712868_0001, Tracking URL = http://localhost:8088/proxy/application_1675147712868_0001/

Kill Command = /home/tomtop2149/hadoop-3.3.4/bin/mapred job -kill job_1675147712868_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2023-02-01 11:01:47,161 Stage-1 map = 0%, reduce = 0%

2023-02-01 11:01:52,339 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.87 sec

2023-02-01 11:01:56,473 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.11 sec

MapReduce Total cumulative CPU time: 5 seconds 110 msec

Ended Job = job_1675147712868_0001

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to directory hdfs://ip:9000/hive/warehouse/dw.db/erp_user/.hive-staging_hive_2023-02-01_11-01-37_552_7185416133968794842-1/-ext-10000

Loading data to table dw.erp_user

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.11 sec HDFS Read: 15482 HDFS Write: 237 SUCCESS

Total MapReduce CPU Time Spent: 5 seconds 110 msec

OK

Time taken: 21.67 seconds


6.查询

hive> select * from erp_user;

OK

1 test

Time taken: 0.149 seconds, Fetched: 1 row(s)

简单安装就到这了,后续使用持续研究中。。。

赞 ()