PostgreSQL 中生成随机汉字

    xiaoxiao2025-10-15  2

    标签

    PostgreSQL , 字符集 , 汉字编码 , 随机汉字 , chr , ascii , unicode


    背景

    汉字unicode编码范围

    php中utf-8编码下用正则表达式匹配汉字的最终正确表达式 —— /[\x{4e00}-\x{9fa5}]/u \u4e00-\u9fa5 (中文) \x3130-\x318F (韩文) \xAC00-\xD7A3 (韩文) \u0800-\u4e00 (日文)

    4e00 对应 19968

    9fa5 对应 40869

    一共 20901 个汉字

    例子1

    注意,为了保证能输出所有的汉字,确保数据库的编码为UTF8,否则会报类似错误。 db=# select '\u9fa5'::text; ERROR: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8 at or near "'\u9fa5" LINE 1: select '\u9fa5'::text; ^

    例子

    set standard_conforming_strings =off; set escape_string_warning=off; postgres=# select '\u9fa5'::text; text ------ 龥 (1 row)

    例子2

    除了unicode的写法,PostgreSQL还提供了两个函数,支持数值的写法。

    FunctionReturn TypeDescriptionExampleResultascii(string)intASCII code of the first character of the argument. For UTF8 returns the Unicode code point of the character. For other multibyte encodings, the argument must be an ASCII character.ascii('x')120chr(int)textCharacter with the given code. For UTF8 the argument is treated as a Unicode code point. For other multibyte encodings the argument must designate an ASCII character. The NULL (0) character is not allowed because text data types cannot store such bytes.chr(65)A postgres=# select chr(19968); chr ----- 一 (1 row)

    输出所有汉字

    例子

    do language plpgsql $$ declare res text := ''; begin for i in 19968..40869 loop res := res||chr(i); end loop; raise notice '%', res; end; $$; NOTICE: 一丁丂七丄丅丆万丈三上下丌不与丏丐丑丒专且丕世丗丘丙业丛东丝.............................

    输出随机汉字

    例子

    创建函数

    create or replace function gen_hanzi(int) returns text as $$ declare res text; begin if $1 >=1 then select string_agg(chr(19968+(random()*20901)::int), '') into res from generate_series(1,$1); return res; end if; return null; end; $$ language plpgsql strict;

    使用函数,生成随机汉字

    postgres=# select gen_hanzi(10); gen_hanzi ---------------------- 析埲錀噝穎灯嬪閸醿厧 (1 row) postgres=# select gen_hanzi(10); gen_hanzi ---------------------- 仫哸擡襖批梹繜嚪隶別 (1 row)

    参考

    http://baike.baidu.com/item/Unicode

    http://unicode.org/charts/

    https://en.wikipedia.org/wiki/Unicode

    《PostgreSQL 转义、UNICODE、与SQL注入》

    http://stackoverflow.com/questions/3970795/how-do-you-create-a-random-string-thats-suitable-for-a-session-id-in-postgresql

    https://www.postgresql.org/docs/9.6/static/functions-string.html

    最新回复(0)