http://blog.csdn.net/jthink_/article/details/38853573
1. 假设我们在hive中有两张表,其中一张表是存用户基本信息,另一张表是存用户的地址信息等,表数据假设如下:
user_basic_info:
idname1a2b3c4d
user_address;
nameaddressaadd1aadd2badd3cadd4dadd5
我们可以看到同一个用户不止一个地址(这里是假设的),我们需要把数据变为如下格式:
idnameaddress1aadd1,add22badd33cadd44dadd5
collect_set这就用到了hive中的行转列的知识,需要用到两个内置UDF: collect_set, concat_ws,
两个函数解释如下见:http://www.cnblogs.com/end/archive/2012/06/18/2553682.html
建表:
[sql] view plaincopy
create table user_basic_info(id string, name string); create table user_address(name string, address string);加载数据:
[sql] view plaincopy
load data local inpath '/home/jthink/work/workspace/hive/row_col_tran/data1' into table user_basic_info; load data local inpath '/home/jthink/work/workspace/hive/row_col_tran/data2' into table user_address;执行合并:
[sql] view plaincopy
select max(ubi.id), ubi.name, concat_ws(',', collect_set(ua.address)) as address from user_basic_info ubi join user_address ua on ubi.name=ua.name group by ubi.name;运行结果:
1 a add1,add2 2 b add3 3 c add4 4 d add5
2. 假设我们有一张表:
user_info:
idnameaddress1aadd1,add22badd33cadd44dadd5
我们需要拆分address,变为:
idnameaddress1aadd11aadd22badd33cadd44dadd5
我们很容易想到用UDTF,explode():
select explode(address) as address from user_info;
这样执行的结果只有address, 但是我们需要完整的信息:
select id, name, explode(address) as address from user_info;
这样做是不对的, UDTF's are not supported outside the SELECT clause, nor nested in expressions
所以我们需要这样做:
select id, name, add from user_info ui lateral view explode(ui.address) adtable as add;
结果为:
1 a add1 1 a add2 2 b add3 3 c add4 4 d add5
