高并发写入存储线性相关性优化

xiaoxiao2024-03-29 12

背景

BRIN索引的原理很简单，每个数据块（或者每一段连续的数据块）记录了其中的元数据（最大值，最小值，平均值，COUNT，AVG，NULL值个数等），当每个数据块存储的数据范围错开（边界清晰）时，块级索引的效率就非常高。

例如测试表

create table test(id int, info text); -- 对ID创建BRIN索引 create index idx_test_id on test using brin(id);

ID边界清晰的例子：

数据块1：ID值范围 1-100 数据块2：ID值范围 101-200 数据块3：ID值范围 301-300 ......

ID边界模糊的例子：

数据块1：ID值范围 1-30 , 60-100 数据块2：ID值范围 31-49, 101-130 数据块3：ID值范围 49-59, 130-159, 160-200 ......

边界清晰，指不同的数据块没有交集，或者少量交集，而边界模糊，指数据块的交集很多，数据存储散乱。

为什么边界模糊性能不好呢？

例如

select * from test where id between 1 and 100;

这条SQL，在边界清晰时，搜索1号数据块即可。

而在边界模糊时，需要搜索1,2,3个数据块。

在实际业务中，我们可能会使用自增序列，作为时序字段，当串行（单进程）写入时，序列值和行号（堆表物理存储）线性相关，而当并行写入时，情况可能会变得糟糕（仅仅当使用了大步调时）。

怎么优化呢？

自增序列，并行写入的问题

1、串行写入

create table test(id serial8, info text); insert into test (info) select 'test' from generate_series(1,1000000);

线性相关性 = 1

postgres=# analyze test; ANALYZE postgres=# select correlation from pg_stats where tablename='test' and attname='id'; correlation ------------- 1 (1 row)

BRIN性能

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test where id between 1000 and 10000; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on public.test (cost=6.85..5761.69 rows=9264 width=13) (actual time=0.134..2.703 rows=9001 loops=1) Output: id, info Recheck Cond: ((test.id >= 1000) AND (test.id <= 10000)) Rows Removed by Index Recheck: 14679 Heap Blocks: lossy=128 Buffers: shared hit=130 -> Bitmap Index Scan on idx_test_id (cost=0.00..4.53 rows=23256 width=0) (actual time=0.026..0.026 rows=1280 loops=1) Index Cond: ((test.id >= 1000) AND (test.id <= 10000)) Buffers: shared hit=2 Planning time: 0.085 ms Execution time: 3.188 ms (11 rows)

2、并行写入

postgres=# drop table test; DROP TABLE postgres=# create table test(id serial8, info text); CREATE TABLE vi test.sql insert into test (info) values ('test'); pgbench -M prepared -n -r -P 1 -f ./test.sql -c 400 -j 400 -t 2500

线性相关性，略差，但依旧很好

BRIN性能

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test where id between 1000 and 10000; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on public.test (cost=6.83..5952.29 rows=9204 width=13) (actual time=0.073..2.687 rows=9001 loops=1) Output: id, info Recheck Cond: ((test.id >= 1000) AND (test.id <= 10000)) Rows Removed by Index Recheck: 14679 Heap Blocks: lossy=128 Buffers: shared hit=130 -> Bitmap Index Scan on idx_test_id (cost=0.00..4.53 rows=22764 width=0) (actual time=0.033..0.033 rows=1280 loops=1) Index Cond: ((test.id >= 1000) AND (test.id <= 10000)) Buffers: shared hit=2 Planning time: 0.086 ms Execution time: 3.185 ms (11 rows)

3、并行写入，放大序列步调，散列问题出现

postgres=# drop table test; DROP TABLE postgres=# create table test(id serial8, info text); CREATE TABLE postgres=# alter sequence test_id_seq cache 10000; ALTER SEQUENCE vi test.sql insert into test (info) values ('test'); pgbench -M prepared -n -r -P 1 -f ./test.sql -c 400 -j 400 -t 2500

线性相关性，直线下降

postgres=# create index idx_test_id on test using brin(id); CREATE INDEX postgres=# analyze test; ANALYZE postgres=# select correlation from pg_stats where tablename='test' and attname='id'; correlation ------------- 0.571033 (1 row) -- 偏差非常明显 postgres=# select *,id-rn from (select ctid,id,row_number() over(order by ctid) rn from test) t where id<>rn; ctid | id | rn | ?column? ------------+---------+---------+---------- (0,6) | 10001 | 6 | 9995 (0,7) | 6 | 7 | -1 (0,8) | 7 | 8 | -1 (0,9) | 10002 | 9 | 9993 (0,10) | 20001 | 10 | 19991 (0,11) | 8 | 11 | -3 (0,12) | 10003 | 12 | 9991 (0,13) | 9 | 13 | -4 (0,14) | 20002 | 14 | 19988 (0,15) | 10004 | 15 | 9989 (0,16) | 10 | 16 | -6 (0,17) | 20003 | 17 | 19986 (0,18) | 10005 | 18 | 9987 (0,19) | 11 | 19 | -8 (0,20) | 20004 | 20 | 19984 (0,21) | 10006 | 21 | 9985 (0,22) | 12 | 22 | -10 (0,23) | 20005 | 23 | 19982 (0,24) | 10007 | 24 | 9983 (0,25) | 13 | 25 | -12 (0,26) | 20006 | 26 | 19980 (0,27) | 14 | 27 | -13 (0,28) | 20007 | 28 | 19979 (0,29) | 10008 | 29 | 9979 (0,30) | 15 | 30 | -15 (0,31) | 20008 | 31 | 19977 (0,32) | 10009 | 32 | 9977 (0,33) | 20009 | 33 | 19976 (0,34) | 16 | 34 | -18 (0,35) | 30001 | 35 | 29966 (0,36) | 20010 | 36 | 19974 (0,37) | 10010 | 37 | 9973 (0,38) | 17 | 38 | -21 (0,39) | 20011 | 39 | 19972 (0,40) | 20012 | 40 | 19972 (0,41) | 10011 | 41 | 9970 (0,42) | 18 | 42 | -24 (0,43) | 30002 | 43 | 29959 (0,44) | 20013 | 44 | 19969 (0,45) | 19 | 45 | -26 (0,46) | 10012 | 46 | 9966 (0,47) | 30003 | 47 | 29956 (0,48) | 20014 | 48 | 19966 (0,49) | 20 | 49 | -29 (0,50) | 30004 | 50 | 29954 (0,51) | 10013 | 51 | 9962 (0,52) | 20015 | 52 | 19963 (0,53) | 21 | 53 | -32 (0,54) | 30005 | 54 | 29951 (0,55) | 10014 | 55 | 9959 (0,56) | 20016 | 56 | 19960 (0,57) | 22 | 57 | -35 (0,58) | 20017 | 58 | 19959

BRIN性能

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test where id between 1000 and 10000; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on public.test (cost=5.11..6244.85 rows=2222 width=13) (actual time=7.104..43.755 rows=1501 loops=1) Output: id, info Recheck Cond: ((test.id >= 1000) AND (test.id <= 10000)) Rows Removed by Index Recheck: 448419 Heap Blocks: lossy=2432 Buffers: shared hit=2440 -- 散列，BRIN索引混淆了，导致扫描了大量的数据块。 -> Bitmap Index Scan on idx_test_id (cost=0.00..4.56 rows=38916 width=0) (actual time=0.071..0.071 rows=24320 loops=1) Index Cond: ((test.id >= 1000) AND (test.id <= 10000)) Buffers: shared hit=8 Planning time: 0.128 ms Execution time: 43.867 ms (11 rows)

并发时序线性优化

1、使用单步序列。单步序列不会因为并发增加而导致离散度增加。

2、使用系统时间字段，系统时间与单步序列效果一样，即使并发写入，也不会导致离散度增加。

使用连续自增值的字段，创建BRIN索引就是很靠谱的。

高并发写入存储线性相关性优化

标签

背景

自增序列，并行写入的问题

并发时序线性优化