1.创建database和table,并把数据放到db2.sogou中
hive> show databases; OK db1 db2 default Time taken: 0.057 seconds, Fetched: 3 row(s) hive> drop database db2 cascade; //删除某个以存在的database;加cascade是强制删除,hive不允许删除含有内容的database; OK Time taken: 3.491 seconds hive> create database db2; OK Time taken: 0.162 seconds hive> create table if not exists db2.sogou( //创建 > time string, > uid string, > keyword string, > rank int, > seq int > ) > row format delimited fields terminated by '\t'; OK Time taken: 0.39 seconds hive> load data local inpath '/root/sogou.500w.utf8' into table db2.sogou;2.启动mysql
[root@master ~]# mysql -u hive -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 31 Server version: 5.7.26 MySQL Community Server (GPL) Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> shows tables; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'shows tables' at line 1 mysql> show tables; ERROR 1046 (3D000): No database selected mysql> use hive; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> show tables; +---------------------------+ | Tables_in_hive | +---------------------------+ | AUX_TABLE | | BUCKETING_COLS | | CDS | | COLUMNS_V2 | | COMPACTION_QUEUE | | COMPLETED_COMPACTIONS | | COMPLETED_TXN_COMPONENTS | | DATABASE_PARAMS | | DBS | | DB_PRIVS | | DELEGATION_TOKENS | | FUNCS | | FUNC_RU | | GLOBAL_PRIVS | | HIVE_LOCKS | | IDXS | | INDEX_PARAMS | | KEY_CONSTRAINTS | | MASTER_KEYS | | NEXT_COMPACTION_QUEUE_ID | | NEXT_LOCK_ID | | NEXT_TXN_ID | | NOTIFICATION_LOG | | NOTIFICATION_SEQUENCE | | NUCLEUS_TABLES | | PARTITIONS | | PARTITION_EVENTS | | PARTITION_KEYS | | PARTITION_KEY_VALS | | PARTITION_PARAMS | | PART_COL_PRIVS | | PART_COL_STATS | | PART_PRIVS | | ROLES | | ROLE_MAP | | SDS | | SD_PARAMS | | SEQUENCE_TABLE | | SERDES | | SERDE_PARAMS | | SKEWED_COL_NAMES | | SKEWED_COL_VALUE_LOC_MAP | | SKEWED_STRING_LIST | | SKEWED_STRING_LIST_VALUES | | SKEWED_VALUES | | SORT_COLS | | TABLE_PARAMS | | TAB_COL_STATS | | TBLS | | TBL_COL_PRIVS | | TBL_PRIVS | | TXNS | | TXN_COMPONENTS | | TYPES | | TYPE_FIELDS | | VERSION | | WRITE_SET | +---------------------------+ 57 rows in set (0.00 sec) mysql> select * from COLUMNS_V2; +-------+---------+-------------+-----------+-------------+ | CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX | +-------+---------+-------------+-----------+-------------+ | 1 | NULL | ip | string | 0 | | 1 | NULL | size | string | 3 | | 1 | NULL | time | string | 1 | | 1 | NULL | url | string | 2 | | 2 | NULL | ip | string | 0 | | 2 | NULL | size | string | 3 | | 2 | NULL | time | string | 1 | | 2 | NULL | url | string | 2 | | 6 | NULL | ip | string | 0 | | 6 | NULL | time | string | 1 | | 6 | NULL | url | string | 2 | | 7 | NULL | ip | string | 0 | | 7 | NULL | time | string | 1 | | 7 | NULL | url | string | 2 | | 12 | NULL | keyword | string | 2 | | 12 | NULL | rank | int | 3 | | 12 | NULL | seq | int | 4 | | 12 | NULL | time | string | 0 | | 12 | NULL | uid | string | 1 | +-------+---------+-------------+-----------+-------------+ 19 rows in set (0.00 sec) mysql>3.hive查看table
hive> select * from db2.sogou limit 10;(hive模拟sql语句,文件数量太大,只输出前十行) OK 20111230000005 57375476989eea12893c0c3811607bcf 奇艺高清 1 1 20111230000005 66c5bb7774e31d0a22278249b26bc83a 凡人修仙传 3 1 20111230000007 b97920521c78de70ac38e3713f524b50 本本联盟 1 1 20111230000008 6961d0c97fe93701fc9c0d861d096cd9 华南师范大学图书馆 1 1 20111230000008 f2f5a21c764aebde1e8afcc2871e086f 在线代理 2 1 20111230000009 96994a0480e7e1edcaef67b20d8816b7 伟大导演 1 1 20111230000009 698956eb07815439fe5f46e9a4503997 youku 1 1 20111230000009 599cd26984f72ee68b2b6ebefccf6aed 安徽合肥365房产网 1 1 20111230000010 f577230df7b6c532837cd16ab731f874 哈萨克网址大全 1 1 20111230000010 285f88780dd0659f5fc8acc7cc4949f2 IQ数码 1 1 Time taken: 18.74 seconds, Fetched: 10 row(s)
4.创建keyword_count用于存放词频统计的结果
hive> create table if not exists db2.keyword_count( > keyword string, > count int > ) > row format delimited fields terminated by '\t'; OK Time taken: 1.294 seconds5.进行统计
hive> insert overwrite table db2.keyword_count > select keyword,count(time) from db2.sogou group by keyword; Kill Command = /usr/hadoop/bin/hadoop job -kill job_1558881978093_0001 Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 3 2019-05-27 00:42:46,578 Stage-1 map = 0%, reduce = 0% 2019-05-27 00:43:29,284 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 17.5 sec 2019-05-27 00:43:36,215 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 38.47 sec ······· Total MapReduce CPU Time Spent: 1 minutes 47 seconds 690 msec OK6.输出结果的前几行
hive> select * from db2.keyword_count limit 20; OK ! ( NEW ) ! gay japan - [coat] Male Body Undress 2 [CTO340] (2 029 125 632).avi 1935M "http://10.1.5.211/" 1 "http://17tmp3.com/" 1 "http://192.168.1.1/" 34 "http://192.168.1.101/" 1 "http://210.38.198.40/" 1 "http://221.136.78.219:8080/sionline" 1 "http://222.90.111.150/invoicing" 1 "http://222.90.111.150/invoicing//" 2 Time taken: 0.51 seconds, Fetched: 20 row(s)7.按次数降序排列,修改select语句,count(time) as c:time是唯一的,把time当做c,group by:按什么统计
order by c:出现个数按降序排列,desc:降序简写
hive> insert overwrite table db2.keyword_count > select keyword,count(time) as c from db2.sogou group by keyword order by c desc; Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 3 2019-05-27 00:51:46,489 Stage-1 map = 0%, reduce = 0% 2019-05-27 00:52:17,217 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 17.39 sec 2019-05-27 00:53:26,775 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 90.19 sec MapReduce Total cumulative CPU time: 1 minutes 30 seconds 190 msec Selecting local mode for task: Stage-2 Job running in-process (local Hadoop) 2019-05-27 00:53:31,291 Stage-2 map = 0%, reduce = 0% 2019-05-27 00:53:53,484 Stage-2 map = 100%, reduce = 100% Stage-Stage-1: Map: 3 Reduce: 3 Cumulative CPU: 90.19 sec HDFS Read: 573709281 HDFS Write: 62953072 SUCCESS Stage-Stage-2: HDFS Read: 125931322 HDFS Write: 176545197 SUCCESS Total MapReduce CPU Time Spent: 1 minutes 30 seconds 190 msec OK Time taken: 150.705 seconds hive>处理过程有两个stage,1个MapReduce完成不了,要分两个阶段
8.查看统计结果,只看前几行
hive> select * from db2.keyword_count limit 10; OK 百度 38441 baidu 18312 人体艺术 14475 4399小游戏 11438 qq空间 10317 优酷 10158 新亮剑 9654 馆陶县县长闫宁的父亲 9127 公安卖萌 8192 Time taken: 0.529 seconds, Fetched: 10 row(s) hive>数据地址
链接:https://pan.baidu.com/s/1iEjTtCOEP4pPjYzlA6LoKw 提取码:uhlo
1.创建hiveTestTable:
hive> CREATE TABLE IF NOT EXISTS db1.hiveTestTable( > ip string, > time string, > url string, > size string > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; OK Time taken: 2.858 seconds2.删除database,和table
Drop Table语句
语法:
DROP TABLE [IF EXISTS] table_name;一下查询删除一个名为hello的表
hive> DROP TABLE IF EXISTS hello;删除刚才创建的表
hive> DROP TABLE hiveTestTable; OK Time taken: 0.094 seconds hive>删除一个名为db2的database
hive> show databases; OK db1 db2 default Time taken: 0.057 seconds, Fetched: 3 row(s) hive> drop database db2 cascade; OK Time taken: 3.491 seconds hive> create database if not exists db1; ovide truststore for server certificate verification. OK Time taken: 78.89 seconds3.载入数据到hiveTestTable1表中
数据
95096 20111230000009 698956eb07815439fe5f46e9a4503997 http://www.youku.com/ 95144 20111230000009 599cd26984f72ee68b2b6ebefccf6aed http://hf.house365.com/ 95192 20111230000010 f577230df7b6c532837cd16ab731f874 http://www.kz321.com/ 95192 20111230000010 285f88780dd0659f5fc8acc7cc4949f2 http://www.iqshuma.com/ 95048 20111230000010 f4ba3f337efb1cc469fcd0b34feff9fb http://mobile.zol.com.cn/148/1487938.html 95144 20111230000010 3d1acc7235374d531de1ca885df5e711 http://baike.baidu.com/view/6500.htm 95144 20111230000010 dbce4101683913365648eba6a85b6273 http://zhidao.baidu.com/question/38626533 95096 20111230000011 58e7d0caec23bcb4daa7bbcc4d37f008 http://tv.sogou.com/vertical/2xc3t6wbuk24jnphzlj35zy.html?p=40230600 95096 20111230000011 a3b83dc38b2bbc35660dffcab4ed9da8 http://www.7183.info/ 95192 20111230000011 b89952902d7821db37e8999776b32427 http://wenwen.soso.com/z/q131927207.htm 95048 20111230000011 7c54c43f3a8a0af0951c26d94a57d6c8 http://www.baidu.com/ 95192 20111230000011 2d6c22c084a501c0b8f7f0a845aefd9f http://www.dy241.com/ 95048 20111230000011 11097724dae8b9fdcc60bd6fa4ce4df2 http://118123.net/ 95144 20111230000012 1d374b57fbbc81aa0cc38e6f4efb88ec http://tui.qihoo.com/28302631/article_2893190.html 95144 20111230000012 76029a8965e815b413cba0b50d2ec2b0 http://baike.baidu.com/view/1941330.htm 95192 20111230000013 22201bdc15845bfb33384efc3a283ef4 http://cf.qq.com/ 95144 20111230000013 e0d255845fc9e66b2a25c43a70de4a9a http://hanyu.iciba.com/wiki/1230433.shtml 95048 20111230000013 b89952902d7821db37e8999776b32427 http://zhidao.baidu.com/question/224925866 95096 20111230000013 072fa3643c91b29bd586aff29b402161 http://download.csdn.net/detail/think1919/3935722操作
①载入数据到hiveTestTable1中
hive> LOAD DATA local inpath '/root/hiveTestData.txt' into table db1.hiveTestTable1; Loading data to table db1.hivetesttable1 OK Time taken: 20.104 seconds hive> SELECT * from db1.hiveTestTable1;//查看hiveTableTest1表中的数据 OK 95096 20111230000009 698956eb07815439fe5f46e9a4503997 http://www.youku.com/ 95144 20111230000009 599cd26984f72ee68b2b6ebefccf6aed http://hf.house365.com/ 95192 20111230000010 f577230df7b6c532837cd16ab731f874 http://www.kz321.com/ 95192 20111230000010 285f88780dd0659f5fc8acc7cc4949f2 http://www.iqshuma.com/ 95048 20111230000010 f4ba3f337efb1cc469fcd0b34feff9fb http://mobile.zol.com.cn/148/1487938.html 95144 20111230000010 3d1acc7235374d531de1ca885df5e711 http://baike.baidu.com/view/6500.htm 95144 20111230000010 dbce4101683913365648eba6a85b6273 http://zhidao.baidu.com/question/38626533 Time taken: 4.585 seconds, Fetched: 19 row(s)②从表hiveTestTable1中查询部分数据,并将其作为依据创建表hiveTestTable2
hive> CREATE TABLE db1.hiveTestTable2 AS SELECT ip,time,url from db1.hiveTestTable1; Automatically selecting local only mode for query WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = root_20190526225930_78b79294-34da-4369-86a0-224189628dee Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Job running in-process (local Hadoop) 2019-05-26 22:59:35,283 Stage-1 map = 100%, reduce = 0% Ended Job = job_local1707359481_0001 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://master:9000/hive/warehouse/db1.db/.hive-staging_hive_2019-05-26_22-59-30_592_3294678209563217364-1/-ext-10002 Moving data to directory hdfs://master:9000/hive/warehouse/db1.db/hivetesttable2 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 3388 HDFS Write: 2796 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 7.071 seconds hive> SELECT * from db1.hiveTestTable2; OK 95096 20111230000009 698956eb07815439fe5f46e9a4503997 95144 20111230000009 599cd26984f72ee68b2b6ebefccf6aed 95192 20111230000010 f577230df7b6c532837cd16ab731f874 95192 20111230000010 285f88780dd0659f5fc8acc7cc4949f2 95048 20111230000010 f4ba3f337efb1cc469fcd0b34feff9fb Time taken: 0.61 seconds, Fetched: 19 row(s)③利用表hiveTestTable2的表结构创建hiveTestTable3,并查看表的结构
hive> CREATE TABLE IF NOT EXISTS db1.hiveTestTable3 LIKE db1.hiveTestTable2; OK Time taken: 0.397 seconds hive> DESC db1.hiveTestTable3; OK ip string time string url string Time taken: 0.19 seconds, Fetched: 3 row(s) hive>