大数据技术与应用实验报告6
Hive数据仓库的搭建和简单的使用
HiveQL语句的认知和WordCount的实现
安装并配置Hive
1.先确保完成hadoop环境配置,下载hive,解压文件。
2.安装MySQL:
wget -i -c http://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rpm
yum -y install mysql57-community-release-el7-10.noarch.rpm
yum -y install mysql-community-server
启动mysql: systemctl start mysqld.service
检查是否启动成功: systemctl status mysqld.service
HIVE有两种安装模式,一是内嵌模式,使用内嵌的Derby做为元数据库,另一种是使用外部数据库做为元数据库,我用的是MySQL。
1. 在MySQL创建HIVE用户
CREATE USER 'hive'@'hostName' IDENTIFIED BY '123456'; GRANT ALL PRIVILEGES ON *.* TO hive IDENTIFIED BY '123456' WITH GRANT OPTION;
2. 用新建的用户登录MySQL创建hive数据库
3. 下载MySQL的JDBC驱动包放到hive目录的lib中
root@iZwz9ctwk4oedy81aflpxpZ:/eric/apache-hive-2.1.1-bin/lib# ll | grep mysql -rw-r--r-- 1 root root 2036609 May 11 23:40 mysql-connector-java-8.0.11.jar
4. 从./conf里复制一份template做为配置文件
cp hive-default.xml.template hive-site.xml
5. 配置hive-site.xml
(1)、修改javax.jdo.option.ConnectionURL属性。 <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> (2)、修改javax.jdo.option.ConnectionDriverName属性。 <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.cj.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> (3)、修改javax.jdo.option.ConnectionUserName属性。即数据库用户名。 <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>Username to use against metastore database</description> </property> (4)、修改javax.jdo.option.ConnectionPassword属性。即数据库密码。 <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>password to use against metastore database</description> </property> (5)、添加如下属性hive.metastore.local: <property> <name>hive.metastore.local</name> <value>true</value> <description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description> </property> (6)、修改hive.server2.logging.operation.log.location属性,因为默认的配置里没有指定具体的路径。 <property> <name>hive.server2.logging.operation.log.location</name> <value>/tmp/hive/operation_logs</value> <description>Top level directory where operation logs are stored if logging functionality is enabled</descripti on> </property> (7)、修改hive.exec.local.scratchdir属性。 <property> <name>hive.exec.local.scratchdir</name> <value>/tmp/hive</value> <description>Local scratch space for Hive jobs</description> </property> (8)、修改hive.downloaded.resources.dir属性。 <property> <name>hive.downloaded.resources.dir</name> <value>/tmp/hive/resources</value> <description>Temporary local directory for added resources in the remote file system.</description> </property> (9)、修改属性hive.querylog.location属性。 <property> <name>hive.querylog.location</name> <value>/tmp/hive/querylog</value> <description>Location of Hive run time structured log file</description> </property>
6. 配置hive的log4j配置文件
cp hive-log4j2.properties.template hive-log4j2.properties
7. 运行./bin里面的schematool初始化Hive数据
schematool -dbType mysql -initSchema
hive> show databases; OK default Time taken: 1.49 seconds, Fetched: 1 row(s)
配置好之后就可以查看hive数据库, 运行./bin里面的hive进行hive命令行模式。