震精 - PostgreSQL decimal64 decimal128 高效率数值 类型扩展

    xiaoxiao2025-10-08  5

    标签

    PostgreSQL , decimal64 , decimal128 , float4 , float8 , numeric


    背景

    PostgreSQL内置的数值类型包括

    整型、浮点、整型序列、"无限"精度数值

    NameStorage SizeDescriptionRangesmallint2 bytessmall-range integer-32768 to +32767integer4 bytestypical choice for integer-2147483648 to +2147483647bigint8 byteslarge-range integer-9223372036854775808 to +9223372036854775807decimalvariableuser-specified precision, exactup to 131072 digits before the decimal point; up to 16383 digits after the decimal pointnumericvariableuser-specified precision, exactup to 131072 digits before the decimal point; up to 16383 digits after the decimal pointreal4 bytesvariable-precision, inexact6 decimal digits precisiondouble precision8 bytesvariable-precision, inexact15 decimal digits precisionsmallserial2 bytessmall autoincrementing integer1 to 32767serial4 bytesautoincrementing integer1 to 2147483647bigserial8 byteslarge autoincrementing integer1 to 9223372036854775807

    其中除了 "无限"精度数值类型。他类型都是定长存储,使用时不需要调用palloc,效率较高。

    如果你要使用超过双精能表示的有效范围的数值,目前只能选择decimal\numeric类型,而这个类型前面说了,由于是变长设计,需要调用palloc,效率一般。

    那么在数据分析领域,或者需要处理非常多的数据记录时,numeric类型的开销是较大的。

    PostgreSQL社区有一些扩展,可以解决这个问题,

    1. 比如2nd的fixeddecimal插件,使用INT8来表示NUMERIC,精度可调。

    《PostgreSQL fixeddecimal - 用CPU "硬解码" 提升1倍 数值运算能力 助力金融大数据量计算》

    2. 比如社区的pgdecimal插件,支持decimal32和decimal64两种类型。

    https://pgxn.org/dist/pgdecimal/1.0.0/

    3. 比如vitesse的pgdecimal插件,也就是本文将提到的插件,支持decimal64与decimal128类型,精度基本上足够使用。

    推荐使用vitesse提供的pgdecimal插件,因为它效率够高,精度够大。

    pgdecimal插件介绍

    有两个常见的decimal库,decNumber以及Intel提供的Intel ADX库。

    pgdecimal插件选择了decNumber库,因为GCC也在用它(法律风险更小?)

    https://github.com/gcc-mirror/gcc/tree/master/libdecnumber

    decimal库的性能对比

    http://speleotrove.com/decimal/dpintro.html

    decNumber与Inter ADX性能接近,但是Inter ADX提供了decimal64/128, int32/64, float/double类型的相互转换,这个很给力。(也许将来vitesse会支持intel adx库吧)

    pgdecimal 依赖的decNumber,因此我们必须先安装decNumber

    decNumber安装

    1. 下载 decNumber package

    http://speleotrove.com/decimal/

    wget http://speleotrove.com/decimal/decNumber-icu-368.zip unzip decNumber-icu-368.zip

    或者从本站链接下载

    2. 安装decNumber到postgresql软件目录中(假设postgresql安装在/home/digoal/pgsql9.6)

    首先要在postgresql软件的include目录中,创建一个空目录,

    mkdir -p /home/digoal/pgsql9.6/include/decnumber

    在decNumber src目录中创建Makefile,install -D 修改为对应要安装的目录。

    cd decNumber vi Makefile OBJS = decSingle.o decDouble.o decQuad.o decNumber.o decContext.o CFLAGS = -Wall -g -O2 -fPIC libdecnumber.a: $(OBJS) ar -rcs libdecnumber.a $(OBJS) clean: rm -f libdecnumber.a $(OBJS) install: install -D *.h /home/digoal/pgsql9.6/include/decnumber install -D libdecnumber.a /home/digoal/pgsql9.6/lib

    3. 编译安装decNumber

    cd decNumber make make install

    4. decNumber的C库reference如下, pgdecimal插件中用到的decnumber库,需要了解细节的话请参考:

    The decNumber C library

    pgdecimal安装

    git clone https://github.com/vitesse-ftian/pgdecimal

    或者从本站链接下载

    cd pgdecimal

    有一个小BUG,.control的版本号没有与sql文件的版本号对齐

    mv decimal--2.0.sql decimal--1.0.sql

    另外,需要修改一下Makefile,指定版本,以及decnumber的include和lib目录

    vi Makefile PG_CPPFLAGS = -I/home/digoal/pgsql9.6/include/decnumber SHLIB_LINK = -L/home/digoal/pgsql9.6/lib -ldecnumber DATA = decimal--1.0.sql

    安装

    export PATH=/home/digoal/pgsql9.6/bin:$PATH USE_PGXS=1 make clean USE_PGXS=1 make USE_PGXS=1 make install /bin/mkdir -p '/home/digoal/pgsql9.6/lib' /bin/mkdir -p '/home/digoal/pgsql9.6/share/extension' /bin/mkdir -p '/home/digoal/pgsql9.6/share/extension' /usr/bin/install -c -m 755 decimal.so '/home/digoal/pgsql9.6/lib/decimal.so' /usr/bin/install -c -m 644 .//decimal.control '/home/digoal/pgsql9.6/share/extension/' /usr/bin/install -c -m 644 .//decimal--1.0.sql '/home/digoal/pgsql9.6/share/extension/'

    使用

    psql postgres=# create extension decimal; CREATE EXTENSION

    pgdecimal性能对比

    使用int8, float8, decimal64, decimal128, numeric(15,3) 几种类型,分别比较这几种类型的性能。

    create table tt(ii bigint, d double precision, d64 decimal64, d128 decimal128, n numeric(15, 3)); postgres=# \timing Timing is on. 生成测试数据 postgres=# insert into tt select i, i + 0.123, i + 0.123::decimal64, i + 0.123::decimal128, i + 0.123 from generate_series(1, 1000000) i; INSERT 0 1000000 Time: 2125.723 ms postgres=# select * from tt limit 2; ii | d | d64 | d128 | n ----+-------+-------+-------+------- 1 | 1.123 | 1.123 | 1.123 | 1.123 2 | 2.123 | 2.123 | 2.123 | 2.123 (2 rows)

    普通查询性能对比

    postgres=# select count(*) from tt where (d + d*d + d*d*d + d*d*d*d) > 10000000; count -------- 999945 (1 row) Time: 411.418 ms postgres=# select count(*) from tt where (n + n*n + n*n*n + n*n*n*n) > 10000000; count -------- 999945 (1 row) Time: 1949.367 ms postgres=# select count(*) from tt where (d64 + d64*d64 + d64*d64*d64 + d64*d64*d64*d64) > 10000000; count -------- 999945 (1 row) Time: 1165.304 ms postgres=# select count(*) from tt where (d128 + d128*d128 + d128*d128*d128 + d128*d128*d128*d128) > 10000000; count -------- 999945 (1 row) Time: 1517.179 ms

    排序性能对比

    postgres=# select * from tt order by d limit 2 offset 999000; ii | d | d64 | d128 | n --------+------------+------------+------------+------------ 999001 | 999001.123 | 999001.123 | 999001.123 | 999001.123 999002 | 999002.123 | 999002.123 | 999002.123 | 999002.123 (2 rows) Time: 804.645 ms postgres=# select * from tt order by n limit 2 offset 999000; ii | d | d64 | d128 | n --------+------------+------------+------------+------------ 999001 | 999001.123 | 999001.123 | 999001.123 | 999001.123 999002 | 999002.123 | 999002.123 | 999002.123 | 999002.123 (2 rows) Time: 2828.066 ms postgres=# select * from tt order by d64 limit 2 offset 999000; ii | d | d64 | d128 | n --------+------------+------------+------------+------------ 999001 | 999001.123 | 999001.123 | 999001.123 | 999001.123 999002 | 999002.123 | 999002.123 | 999002.123 | 999002.123 (2 rows) Time: 1826.044 ms postgres=# select * from tt order by d128 limit 2 offset 999000; ii | d | d64 | d128 | n --------+------------+------------+------------+------------ 999001 | 999001.123 | 999001.123 | 999001.123 | 999001.123 999002 | 999002.123 | 999002.123 | 999002.123 | 999002.123 (2 rows) Time: 2118.647 ms

    哈希JOIN性能对比

    postgres=# explain select count(*) from tt t1 join tt t2 on t1.d64 * t1.d64 + t1.d64 = t2.d64 + t2.d64 * t2.d64; QUERY PLAN ---------------------------------------------------------------------------------- Aggregate (cost=6875071228.00..6875071228.01 rows=1 width=8) -> Hash Join (cost=36707.00..5625071228.00 rows=500000000000 width=0) Hash Cond: (((t1.d64 * t1.d64) + t1.d64) = (t2.d64 + (t2.d64 * t2.d64))) -> Seq Scan on tt t1 (cost=0.00..20300.00 rows=1000000 width=8) -> Hash (cost=20300.00..20300.00 rows=1000000 width=8) -> Seq Scan on tt t2 (cost=0.00..20300.00 rows=1000000 width=8) (6 rows) Time: 0.508 ms postgres=# select count(*) from tt t1 join tt t2 on t1.d64 * t1.d64 + t1.d64 = t2.d64 + t2.d64 * t2.d64; count --------- 1000000 (1 row) Time: 1681.451 ms postgres=# select count(*) from tt t1 join tt t2 on t1.n * t1.n + t1.n = t2.n + t2.n * t2.n; count --------- 1000000 (1 row) Time: 2395.894 ms

    嵌套循环性能对比

    postgres=# explain select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.d * t1.d + t1.d > t2.d + t2.d * t2.d; QUERY PLAN ------------------------------------------------------------------------------- Aggregate (cost=2699703.15..2699703.16 rows=1 width=8) -> Nested Loop (cost=0.00..2614087.74 rows=34246165 width=0) Join Filter: (((t1.d * t1.d) + t1.d) > (t2.d + (t2.d * t2.d))) -> Seq Scan on tt t1 (cost=0.00..22800.00 rows=10136 width=8) Filter: (ii < 10000) -> Materialize (cost=0.00..22850.68 rows=10136 width=8) -> Seq Scan on tt t2 (cost=0.00..22800.00 rows=10136 width=8) Filter: (ii < 10000) (8 rows) Time: 0.561 ms postgres=# select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.d * t1.d + t1.d > t2.d + t2.d * t2.d; count ---------- 49985001 (1 row) Time: 19706.890 ms postgres=# select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.n * t1.n + t1.n > t2.n + t2.n * t2.n; count ---------- 49985001 (1 row) Time: 70787.289 ms postgres=# select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.d64 * t1.d64 + t1.d64 > t2.d64 + t2.d64 * t2.d64; count ---------- 49985001 (1 row) Time: 49861.689 ms postgres=# select count(*) from tt t1, tt t2 where t1.ii < 10000 and t2.ii < 10000 and t1.d128 * t1.d128 + t1.d128 > t2.d128 + t2.d128 * t2.d128; count ---------- 49985001 (1 row) Time: 65779.153 ms

    小结

    PostgreSQL内置的numeric类型属于"无限"精度数值类型,其他类型都是定长存储,使用时不需要调用palloc,效率较高。

    如果你要使用超过双精能表示的有效范围的数值,目前只能选择decimal\numeric类型,而这个类型前面说了,由于是变长设计,需要调用palloc,效率一般。

    那么在数据分析领域,或者需要处理非常多的数据记录时,numeric类型的开销是较大的。

    从前面的测试数据,可以观察到性能最好的是float8,其次是decimal64, decimal64不需要使用palloc,性能比numeric好1.5倍左右,而decimal128也比numeric性能好不少。

    期待将来PostgreSQL内置decimal64, decimal128。

    参考

    《PostgreSQL fixeddecimal - 用CPU "硬解码" 提升1倍 数值运算能力 助力金融大数据量计算》

    https://www.postgresql.org/message-id/flat/CAFWGqnsuyOKdOwsNLVtDU1LLjS=66xmxxxS8Chnng_zSB5_uCg@mail.gmail.com#CAFWGqnsuyOKdOwsNLVtDU1LLjS=66xmxxxS8Chnng_zSB5_uCg@mail.gmail.com

    https://github.com/vitesse-ftian/pgdecimal

    https://pgxn.org/dist/pgdecimal/1.0.0/

    https://github.com/2ndQuadrant/fixeddecimal

    最新回复(0)