hive改表结构的两个坑

xiaoxiao2025-10-02 34

坑一：改变字段类型后更新数据不成功

关于hive插入数据的一个小坑，今天插入一个表中数据，插入时写的是常数，比如0.01 ，表中的字段也是DECIMAL(5,2)

按照常理插入的应该是0.01，但是插入后查询是0，为甚！

就分析呀，看语句没问题啊，上网查，上hive官网查，呀~ 发现了原因哦

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Decimals

发现在插入分区表时会出现这种情况，此时需要对之前的分区处理下~：

那就测试一下按照官网的说法:

先建表：

CREATE TABLE `tb_dw_cu_mimsg_fbi_servdt_day1`(

`mimsg_serv` int COMMENT '微信服务量')

PARTITIONED BY (

`statis_date` varchar(8))

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '|' ;

然后插入数据：

insert overwrite table tb_dw_cu_mimsg_fbi_servdt_day1 partition (statis_date=20160501) values(1.02);

然后查询：

hive> select * from tb_dw_cu_mimsg_fbi_servdt_day1;

1 20160501

发现结果跟想象中的一样~

然后修改表字段：

ALTER TABLE tb_dw_cu_mimsg_fbi_servdt_day1 REPLACE COLUMNS (mimsg_serv DECIMAL(5,2))

然后再次插入数据：

insert overwrite table tb_dw_cu_mimsg_fbi_servdt_day1 partition (statis_date=20160501) values(1.02);

查询：

hive> select * from tb_dw_cu_mimsg_fbi_servdt_day1;

1 20160501

发现有问题啦！

那么按照官网处理：

Determine what precision/scale you would like to set for the decimal column in the table.

For each decimal column in the table, update the column definition to the desired precision/scale using the ALTER TABLE command:

ALTER TABLE foo CHANGE COLUMN dec_column_name dec_column_name DECIMAL(38,18);

If the table is not a partitioned table, then you are done. If the table has partitions, then go on to step 3.

If the table is a partitioned table, then find the list of partitions for the table:

SHOW PARTITIONS foo;

ds=2008-04-08/hr=11

ds=2008-04-08/hr=12

...

Each existing partition in the table must also have its DECIMAL column changed to add the desired precision/scale.

This can be done with a single ALTER TABLE CHANGE COLUMN by using dynamic partitioning (available for ALTER TABLE CHANGE COLUMN in Hive 0.14 or later, with HIVE-8411):

SET hive.exec.dynamic.partition = true;

-- hive.exec.dynamic.partition needs to be set to true to enable dynamic partitioning with ALTER PARTITION

-- This will alter all existing partitions of the table - be sure you know what you are doing!

ALTER TABLE foo PARTITION (ds, hr) CHANGE COLUMN dec_column_name dec_column_name DECIMAL(38,18);

这里对表作如下处理：

ALTER TABLE tb_dw_cu_mimsg_fbi_servdt_day1 PARTITION (statis_date) CHANGE COLUMN mimsg_serv mimsg_serv DECIMAL(5,2);

再次插入数据：

insert overwrite table tb_dw_cu_mimsg_fbi_servdt_day1 partition (statis_date=20160501) values(1.02);

然后查询：

hive> select * from tb_dw_cu_mimsg_fbi_servdt_day1;

1.02 20160501

Time taken: 0.066 seconds, Fetched: 1 row(s)

发现结果跟想象中的一样了。这个坑算是过去了~

坑二：增加字段后更新数据不成功

还是上面的例子那张表，再增加一个字段：

alter table tb_dw_cu_mimsg_fbi_servdt_day1 add COLUMNS (prov_code varchar(5))

然后查询

hive> select * from tb_dw_cu_mimsg_fbi_servdt_day1;

1.02 NULL 20160501

Time taken: 0.082 seconds, Fetched: 1 row(s)

发现新增的字段默认的值是NULL，现在我重新覆盖一下元数据，给增加的字段一个值：

insert overwrite table tb_dw_cu_mimsg_fbi_servdt_day1 partition(statis_date=20160501) values (2.01,0371);

然后查询：

hive> select * from tb_dw_cu_mimsg_fbi_servdt_day1;

2.01 NULL 20160501

发现还是NULL

不是我们想象的结果,查看一下官方文档说明发现了问题的所在：

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Add/ReplaceColumns

ALTER TABLE ADD or REPLACE COLUMNS CASCADE will override the table partition's column metadata regardless of the table or partition's protection mode. Use with discretion.

那就这样处理：

alter table tb_dw_cu_mimsg_fbi_servdt_day1 replace COLUMNS (mimsg_serv decimal(5,2),prov_code varchar(5)) CASCADE;

直接查询，发现数据显示的数据已经发生了变化了~

hive> select * from tb_dw_cu_mimsg_fbi_servdt_day1;

2.01 0371 20160501

综上发现，我们是按照Oracle这样的标准在考虑H-SQL，但是通过阅读官方文档发现二者之间还是有很大不同的，通过这次踩坑发现，仔细阅读官方文档的重要性！！！