PostgreSQL 10.0 preview 性能增强 - (多维分析)更快,更省内存hashed aggregation with grouping sets...

    xiaoxiao2025-11-04  25

    标签

    PostgreSQL , 10.0 , hashed aggregation with grouping sets


    背景

    grouping sets 是多维分析语法,PostgreSQL 从9.5开始支持这种语法,常被用于OLAP系统,数据透视等应用场景。

    《PostgreSQL 9.5 new feature - Support GROUPING SETS, CUBE and ROLLUP.》

    由于多维分析的一个QUERY涉及多个GROUP,所以如果使用hash agg的话,需要多个HASH table,并行计算. 9.5, 9.6的时候,还不支持一个QUERY使用多个HASH TABLE并行计算。

    10.0 扩展了聚合NODE,支持hashAggregate并行开多个hashtable,以及MixedAggregate策略用于sort grouping时哈希表的数据倒腾。

    使用时对用户完全透明,同时优化器在使用hash agg, multi hashtable,时,会尽量的减少重复SORT。

    总而言之,grouping set多维分析会更快(即使包含排序),更省内存。

    Support hashed aggregation with grouping sets. This extends the Aggregate node with two new features: HashAggregate can now run multiple hashtables concurrently, and a new strategy MixedAggregate populates hashtables while doing sorted grouping. The planner will now attempt to save as many sorts as possible when planning grouping sets queries, while not exceeding work_mem for the estimated combined sizes of all hashtables used. No SQL-level changes are required. There should be no user-visible impact other than the new EXPLAIN output and possible changes to result ordering when ORDER BY was not used (which affected a few regression tests). The enable_hashagg option is respected. Author: Andrew Gierth Reviewers: Mark Dilger, Andres Freund Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk

    例子

    +explain (costs off) select a, b, grouping(a,b), sum(v), count(*), max(v) + from gstest1 group by grouping sets ((a),(b)) order by 3,1,2; + QUERY PLAN +-------------------------------------------------------------------------------------------------------- + Sort + Sort Key: (GROUPING("*VALUES*".column1, "*VALUES*".column2)), "*VALUES*".column1, "*VALUES*".column2 + -> HashAggregate + Hash Key: "*VALUES*".column1 + Hash Key: "*VALUES*".column2 + -> Values Scan on "*VALUES*" +(6 rows)

    这个patch的讨论,详见邮件组,本文末尾URL。

    PostgreSQL社区的作风非常严谨,一个patch可能在邮件组中讨论几个月甚至几年,根据大家的意见反复的修正,patch合并到master已经非常成熟,所以PostgreSQL的稳定性也是远近闻名的。

    参考

    https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b5635948ab165b6070e7d05d111f966e07570d81

    最新回复(0)