Greenplum , HAWQ outer join与motion问题讲解

    xiaoxiao2026-06-18  8

    Greenplum , HAWQ outer join与motion问题讲解

    作者

    digoal

    日期

    2016-09-05

    标签

    PostgreSQL , HAWQ , Greenplum , OUTER JOIN , Motion


    背景

    Greenplum,HAWQ是分布式的数据库,在建表时,我们可以选择分布列,或者选择随机分布。

    多个表做等值JOIN时,如果JOIN列为分布列,则不需要进行数据的重分布。

    但是,如果使用的是OUTER JOIN,情况就不一样了,你可能会发现多个表进行outer join时,如果JOIN列都是HASH分布列,某些写法就可能导致需要重分布。

    下面给大家分析一下原因。

    创建几张测试表

    其中tab1,tab2,tab3的bucketnum一致,分布列一致。

    tab4,tab5,tab6的分布列与前面几张表一致,但是bucketnum不一致(所以显然HASH取模后的值也不一致)。

    bucketnum默认是实体segments的6倍(意思是每台实体segment上跑6个虚拟segment),用户可以根据表的大小来调试,例如要发挥最大的计算能力,实际设置时可以设置为主机CPU核数的0.8,小表则建议设置为较小的值。

    tab7与tab8为随机分布,对于随机分布的表,bucketnum设置会忽略,在实体segments之间随机分布(查看hdfs对应的文件也能看出端倪)。

    postgres=# create table tab1(c1 int, c2 int, c3 text, c4 timestamp) with (bucketnum=6) distributed by(c1,c2); postgres=# create table tab2(c1 int, c2 int, c3 text, c4 timestamp) with (bucketnum=6) distributed by(c1,c2); postgres=# create table tab3(c1 int, c2 int, c3 text, c4 timestamp) with (bucketnum=6) distributed by(c1,c2); postgres=# create table tab4(c1 int, c2 int, c3 text, c4 timestamp) with (bucketnum=10) distributed by(c1,c2); postgres=# create table tab5(c1 int, c2 int, c3 text, c4 timestamp) with (bucketnum=12) distributed by(c1,c2); postgres=# create table tab6(c1 int, c2 int, c3 text, c4 timestamp) with (bucketnum=3) distributed by(c1,c2); postgres=# create table tab7(c1 int, c2 int, c3 text, c4 timestamp) with (bucketnum=3) distributed randomly; postgres=# create table tab8(c1 int, c2 int, c3 text, c4 timestamp) with (bucketnum=6) distributed randomly; postgres=# insert into tab7 select generate_series(1,1000000),generate_series(1,1000000),'test',now(); INSERT 0 1000000 postgres=# insert into tab8 select generate_series(1,1000000),generate_series(1,1000000),'test',now(); INSERT 0 1000000 postgres=# analyze tab7; ANALYZE postgres=# analyze tab8; ANALYZE

    下面开始几组测试,通过执行计划的motion node,观察query是否需要重分布,以及重分布那张表,重分布的键值是哪些。

    测试1

    HAWQ的hash分布取模值取决于bucket num,如果bucket num不一致,则JOIN时有一张表需要重分布

    postgres=# explain select * from tab1 join tab6 on (tab1.c1=tab6.c1 and tab1.c2=tab6.c2); QUERY PLAN ------------------------------------------------------------------------------------------------------- Gather Motion 6:1 (slice2; segments: 6) (cost=0.00..862.00 rows=1 width=48) -> Hash Join (cost=0.00..862.00 rows=1 width=48) Hash Cond: tab1.c1 = tab6.c1 AND tab1.c2 = tab6.c2 -> Table Scan on tab1 (cost=0.00..431.00 rows=1 width=24) -> Hash (cost=431.00..431.00 rows=1 width=24) -> Redistribute Motion 6:6 (slice1; segments: 6) (cost=0.00..431.00 rows=1 width=24) 重分布tab6 Hash Key: tab6.c1, tab6.c2 -> Table Scan on tab6 (cost=0.00..431.00 rows=1 width=24) Settings: default_hash_table_bucket_number=6 Optimizer status: PQO version 1.638 (10 rows)

    测试2

    如果JOIN的其中一个表为随机分布式,随机分布的表需要复制或按JOIN列进行重分布

    postgres=# explain select * from tab6 join tab7 on (tab6.c1=tab7.c1 and tab6.c2=tab7.c2); QUERY PLAN ------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..862.00 rows=1 width=48) -> Hash Join (cost=0.00..862.00 rows=1 width=48) Hash Cond: tab6.c1 = tab7.c1 AND tab6.c2 = tab7.c2 -> Table Scan on tab6 (cost=0.00..431.00 rows=1 width=24) -> Hash (cost=431.00..431.00 rows=1 width=24) -> Redistribute Motion 3:3 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=24) 重分布tab7 Hash Key: tab7.c1, tab7.c2 -> Table Scan on tab7 (cost=0.00..431.00 rows=1 width=24) Settings: default_hash_table_bucket_number=6 Optimizer status: PQO version 1.638 (10 rows)

    测试3

    随机分布策略的表,在每个segment上只有一个bucket

    postgres=# explain select count(*) from tab7; QUERY PLAN ------------------------------------------------------------------------------------ Aggregate (cost=0.00..452.11 rows=1 width=8) -> Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..452.11 rows=1 width=8) -> Aggregate (cost=0.00..452.11 rows=1 width=8) -> Table Scan on tab7 (cost=0.00..450.25 rows=1000000 width=1) Settings: default_hash_table_bucket_number=6 Optimizer status: PQO version 1.638 (6 rows)

    测试4

    如果两个随机分布的表JOIN,数据需要在所有的实体segments(每个datanode对应一个实体segment)上按JOIN列重分布

    postgres=# explain select * from tab8 join tab7 on (tab8.c1=tab7.c1 and tab8.c2=tab7.c2); QUERY PLAN ------------------------------------------------------------------------------------------------------------- Gather Motion 1:1 (slice3; segments: 1) (cost=0.00..2151.49 rows=10000 width=42) -> Hash Join (cost=0.00..2148.64 rows=10000 width=42) Hash Cond: tab8.c1 = tab7.c1 AND tab8.c2 = tab7.c2 -> Redistribute Motion 1:1 (slice1; segments: 1) (cost=0.00..555.04 rows=1000000 width=21) 重分布tab8 Hash Key: tab8.c1, tab8.c2 -> Table Scan on tab8 (cost=0.00..450.25 rows=1000000 width=21) -> Hash (cost=555.04..555.04 rows=1000000 width=21) -> Redistribute Motion 1:1 (slice2; segments: 1) (cost=0.00..555.04 rows=1000000 width=21) 重分布tab7 Hash Key: tab7.c1, tab7.c2 -> Table Scan on tab7 (cost=0.00..450.25 rows=1000000 width=21) Settings: default_hash_table_bucket_number=6 Optimizer status: PQO version 1.638 (12 rows)

    测试5

    outer join,2张表的outer join,如果JOIN列就是分布列,不需要重分布

    postgres=# explain select * from tab1 left join tab2 on (tab1.c1=tab2.c1 and tab1.c2=tab2.c2); QUERY PLAN ------------------------------------------------------------------------------- Gather Motion 6:1 (slice1; segments: 6) (cost=0.00..862.00 rows=2 width=48) -> Hash Left Join (cost=0.00..862.00 rows=1 width=48) Hash Cond: tab1.c1 = tab2.c1 AND tab1.c2 = tab2.c2 -> Table Scan on tab1 (cost=0.00..431.00 rows=1 width=24) -> Hash (cost=431.00..431.00 rows=1 width=24) -> Table Scan on tab2 (cost=0.00..431.00 rows=1 width=24) Settings: default_hash_table_bucket_number=6 Optimizer status: PQO version 1.638 (8 rows)

    测试6

    outer join,3张表的outer join,需要注意JOIN的条件

    1. tab1与tab2 left join后,关联不上时tab2可能会返回一些NULL值

    因此再次与tab3 join时,如果JOIN条件是tab2与tab3,则不能圈定在虚拟segment内完成tab2与tab3的JOIN,必须要对tab1与tab2的outer JOIN中间结果进行重分布,对齐tab3的分布策略,再进行JOIN。

    但是实际上,并不需要重分布,因为null和null比较返回的还是null,所以null实际上不需要重分布到一起去JOIN,本条SQL,对于HAWQ的优化器来说,是有优化余地的。

    postgres=# explain select * from tab1 left join tab2 on (tab1.c1=tab2.c1 and tab1.c2=tab2.c2) left join tab3 on (tab2.c1=tab3.c1 and tab2.c2=tab3.c2); QUERY PLAN ------------------------------------------------------------------------------------------------- Gather Motion 6:1 (slice2; segments: 6) (cost=0.00..1293.00 rows=3 width=72) -> Hash Left Join (cost=0.00..1293.00 rows=1 width=72) Hash Cond: tab2.c1 = tab3.c1 AND tab2.c2 = tab3.c2 -> Redistribute Motion 6:6 (slice1; segments: 6) (cost=0.00..862.00 rows=1 width=48) Hash Key: tab2.c1, tab2.c2 -> Hash Left Join (cost=0.00..862.00 rows=1 width=48) Hash Cond: tab1.c1 = tab2.c1 AND tab1.c2 = tab2.c2 -> Table Scan on tab1 (cost=0.00..431.00 rows=1 width=24) -> Hash (cost=431.00..431.00 rows=1 width=24) -> Table Scan on tab2 (cost=0.00..431.00 rows=1 width=24) -> Hash (cost=431.00..431.00 rows=1 width=24) -> Table Scan on tab3 (cost=0.00..431.00 rows=1 width=24) Settings: default_hash_table_bucket_number=6 Optimizer status: PQO version 1.638 (14 rows)

    2. 这样的条件下,则不需要重分布,因为第一次LEFT JOIN后,TAB1不会产生空值,使用tab1再与tab3进行join也不需要重分布。

    postgres=# explain select * from tab1 left join tab2 on (tab1.c1=tab2.c1 and tab1.c2=tab2.c2) left join tab3 on (tab1.c1=tab3.c1 and tab1.c2=tab3.c2); QUERY PLAN --------------------------------------------------------------------------------- Gather Motion 6:1 (slice1; segments: 6) (cost=0.00..1293.00 rows=3 width=72) -> Hash Left Join (cost=0.00..1293.00 rows=1 width=72) Hash Cond: tab1.c1 = tab3.c1 AND tab1.c2 = tab3.c2 -> Hash Left Join (cost=0.00..862.00 rows=1 width=48) Hash Cond: tab1.c1 = tab2.c1 AND tab1.c2 = tab2.c2 -> Table Scan on tab1 (cost=0.00..431.00 rows=1 width=24) -> Hash (cost=431.00..431.00 rows=1 width=24) -> Table Scan on tab2 (cost=0.00..431.00 rows=1 width=24) -> Hash (cost=431.00..431.00 rows=1 width=24) -> Table Scan on tab3 (cost=0.00..431.00 rows=1 width=24) Settings: default_hash_table_bucket_number=6 Optimizer status: PQO version 1.638 (12 rows)

    3. 如果第三张关联表是JOIN条件,而非OUTER JOIN,同样不需要重分布。

    postgres=# explain select * from tab1 left join tab2 on (tab1.c1=tab2.c1 and tab1.c2=tab2.c2) join tab3 on (tab2.c1=tab3.c1 and tab2.c2=tab3.c2); QUERY PLAN -------------------------------------------------------------------------------------------------------- Gather Motion 6:1 (slice1; segments: 6) (cost=0.00..1293.00 rows=1 width=72) -> Hash Join (cost=0.00..1293.00 rows=1 width=72) Hash Cond: tab2.c1 = tab3.c1 AND tab2.c2 = tab3.c2 AND tab1.c1 = tab3.c1 AND tab1.c2 = tab3.c2 -> Hash Join (cost=0.00..862.00 rows=1 width=48) Hash Cond: tab1.c1 = tab2.c1 AND tab1.c2 = tab2.c2 -> Table Scan on tab1 (cost=0.00..431.00 rows=1 width=24) -> Hash (cost=431.00..431.00 rows=1 width=24) -> Table Scan on tab2 (cost=0.00..431.00 rows=1 width=24) -> Hash (cost=431.00..431.00 rows=1 width=24) -> Table Scan on tab3 (cost=0.00..431.00 rows=1 width=24) Settings: default_hash_table_bucket_number=6 Optimizer status: PQO version 1.638 (12 rows)

    4. 只有JOIN时,也不需要考虑重分布。

    postgres=# explain select * from tab1 join tab2 on (tab1.c1=tab2.c1 and tab1.c2=tab2.c2) join tab3 on (tab2.c1=tab3.c1 and tab2.c2=tab3.c2); QUERY PLAN -------------------------------------------------------------------------------------------------------- Gather Motion 6:1 (slice1; segments: 6) (cost=0.00..1293.00 rows=1 width=72) -> Hash Join (cost=0.00..1293.00 rows=1 width=72) Hash Cond: tab2.c1 = tab3.c1 AND tab2.c2 = tab3.c2 AND tab1.c1 = tab3.c1 AND tab1.c2 = tab3.c2 -> Hash Join (cost=0.00..862.00 rows=1 width=48) Hash Cond: tab1.c1 = tab2.c1 AND tab1.c2 = tab2.c2 -> Table Scan on tab1 (cost=0.00..431.00 rows=1 width=24) -> Hash (cost=431.00..431.00 rows=1 width=24) -> Table Scan on tab2 (cost=0.00..431.00 rows=1 width=24) -> Hash (cost=431.00..431.00 rows=1 width=24) -> Table Scan on tab3 (cost=0.00..431.00 rows=1 width=24) Settings: default_hash_table_bucket_number=6 Optimizer status: PQO version 1.638 (12 rows)

    小结

    1. 随机分布的表与随机分布的表进行JOIN时,可能无法充分利用计算资源,因为每个物理节点只能用到一个核。 2. 随机分布的表与哈希分布的表JOIN时,会根据实际情况,重分布,并行计算。(如果哈希分布的表bucketnum较多,这种QUERY也能用上多核JOIN)。 3. outer join时,如果多次进行,请注意实际的场景逻辑,建议在JOIN时过滤,而不是JOIN完后过滤NULL,以避免重分布。

    Count

    最新回复(0)