HiveQI语句实现：word count

xiaoxiao2023-11-24 180

Hive实现wordCount程序

a. 创建一个数据库，如 create database word;

b. 建表 create external table word_data(line string) row format delimited fields terminated by ‘\n’ stored as textfile location ‘/home/hadoop/worddata’; 这里假设我们的数据存放在hadoop下，路径为：/home/hadoop/worddata，里面主要是一些单词文件，内容大概为： hello man what are you doing now my running hello kevin 执行了上述hql就会创建一张表src_data，内容是这些文件的每行数据，每行数据存在字段line中，select * from word_data;就可以看到这些数据 c. 根据MapReduce的规则，我们需要进行拆分，把每行数据拆分成单词，这里需要用到一个hive的内置表生成函数（UDTF）：explode(array)，参数是array，其实就是行变多列： create table words(word string); insert into table words select explode(split(line, " ")) as word from word_data; 查看words表内容 OK hello man what are you doing now my running hello kevin hi man split是拆分函数，跟java的split功能一样，这里是按照空格拆分，所以执行完hql语句，words表里面就全部保存的单个单词 d. 这样基本实现了，因为hql可以group by，所以最后统计语句为： select word, count(*) from word.words group by word; 注释：word.words 库名称.表名称，group by word这个word是create table words(word string) 命令创建的word string 结果： are 1 doing 1 hello 2 hi 1 kevin 1 man 2 my 1 now 1 running 1 what 1 you 1

总结：对比写MR和使用hive，还是hive比较简便，对于比较复杂的统计操作可以建一些中间表，或者一些视图之类的.

最新回复(0)