Redsi线上运维实战长文总结

    xiaoxiao2025-03-18  25

    确认是否有耗时的查询(慢查询)

    意义:考虑redis是单线程工作,所以线上绝对不能出现慢查询,会直接导致服务不可用

    实战:

    redis-cli> slowlog get 1) 1) (integer) 5 2) (integer) 1558780175 3) (integer) 785231 4) 1) "keys" 2) "search_number:20190525*" 5) "*.*.*.*:50598" 6) "" 2) 1) (integer) 4 2) (integer) 1558780130 3) (integer) 784617 4) 1) "keys" 2) "search_number:20190525*" 5) "*.*.*.*:50598" 6) ""

    其中,各项指标表示:

    A unique progressive identifier for every slow log entry.

    The unix timestamp at which the logged command was processed.

    The amount of time needed for its execution, in microseconds(注意,microseconds翻译成微秒,而不是毫秒).

    The array composing the arguments of the command.

    结论:存在慢查询,且是keys导致的,还是那句话线上一定不能用keys

    既然已经知道了慢查询条目,优化就好了,那么再留着他们也没什么意义,那么如何清空记录呢?

    redis-cli > slowlog reset OK redis-cli > slowlog get (empty list or set)

    利用slowlog reset进行清空,再次查询后已经没有了。

    确认是否又出现内存不足的现象(没有控制台的情况下)

    意义:redis中的数据通常都是热点数据,内存不足会直接导致服务不可用

    实战:

    redis-cli > info # Server redis_version:4.0.12 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:ac5352c436e709cf redis_mode:cluster os:Linux 3.10.0-957.1.3.el7.x86_64 x86_64 arch_bits:64 multiplexing_api:epoll atomicvar_api:atomic-builtin gcc_version:4.8.5 process_id:1018 run_id:f169872d9278e94d7bfcdfe306b343581ba1187e tcp_port:6379 uptime_in_seconds:10965191 uptime_in_days:126 hz:10 lru_clock:15355937 executable:/root/redis-4.0.12/src/./redis-server config_file:/root/redis-cluster/6379/redis.conf # Clients connected_clients:60 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0 # Memory used_memory:27849008 used_memory_human:26.56M used_memory_rss:36982784 used_memory_rss_human:35.27M used_memory_peak:30520120 used_memory_peak_human:29.11M used_memory_peak_perc:91.25% used_memory_overhead:4921286 used_memory_startup:1444896 used_memory_dataset:22927722 used_memory_dataset_perc:86.83% total_system_memory:1927249920 total_system_memory_human:1.79G used_memory_lua:37888 used_memory_lua_human:37.00K maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction mem_fragmentation_ratio:1.33 mem_allocator:jemalloc-4.0.3 active_defrag_running:0 lazyfree_pending_objects:0 # Persistence loading:0 rdb_changes_since_last_save:0 rdb_bgsave_in_progress:0 rdb_last_save_time:1558859457 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:1 rdb_current_bgsave_time_sec:-1 rdb_last_cow_size:2949120 aof_enabled:0 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:-1 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_last_write_status:ok aof_last_cow_size:0 # Stats total_connections_received:3075423 total_commands_processed:619824635 instantaneous_ops_per_sec:266 total_net_input_bytes:27175639111 total_net_output_bytes:3207129431 instantaneous_input_kbps:11.88 instantaneous_output_kbps:1.31 rejected_connections:0 sync_full:1 sync_partial_ok:0 sync_partial_err:1 expired_keys:205 expired_stale_perc:0.00 expired_time_cap_reached_count:0 evicted_keys:0 keyspace_hits:151284 keyspace_misses:34997 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:1213 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0 # Replication role:master connected_slaves:1 slave0:ip=10.10.41.111,port=6383,state=online,offset=44049007,lag=0 master_replid:f91265662fbb961f1fead0e8cc27257a5bd61f97 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:44049007 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:43000432 repl_backlog_histlen:1048576 # CPU used_cpu_sys:11224.78 used_cpu_user:6662.42 used_cpu_sys_children:11.83 used_cpu_user_children:68.38 # Cluster cluster_enabled:1 # Keyspace db0:keys=18588,expires=0,avg_ttl=0

    这个命令太强大了,我在其他文章中曾经进行过详细介绍,见Redis Deepen系列招1---高阶命令之客户端连接DevOps这里不每一项说明了,直接进入正题。

    上述结果项中有一条  

    evicted_keys: 0 因最大内存容量限制而被驱逐(evict)的键数量

    我这里是0,说明内存使用还比较好;若一旦出现大于0的情况,就需要特别注意了。

    $info default $info all 也可以

    确认各种命令执行情况和状态

    意义:太重要了,尤其上线后想看下,到底线上都是执行了什么命令,以及每个命令的执行时间

    实战:

    redis-cli > info commandstats # Commandstats cmdstat_incrby:calls=397,usec=2338,usec_per_call=5.89 cmdstat_zrevrangebyscore:calls=2054,usec=37191,usec_per_call=18.11 cmdstat_setnx:calls=115,usec=822,usec_per_call=7.15 cmdstat_zrevrange:calls=24083,usec=100549,usec_per_call=4.18 cmdstat_llen:calls=5,usec=10,usec_per_call=2.00 cmdstat_zcard:calls=84,usec=242,usec_per_call=2.88 cmdstat_get:calls=103981,usec=198376,usec_per_call=1.91 cmdstat_replconf:calls=10948976,usec=14304689,usec_per_call=1.31 cmdstat_hgetall:calls=27518,usec=281067,usec_per_call=10.21 cmdstat_sadd:calls=1756,usec=4538,usec_per_call=2.58 cmdstat_cluster:calls=4837,usec=550300,usec_per_call=113.77 cmdstat_hincrby:calls=8793,usec=63431,usec_per_call=7.21 cmdstat_hget:calls=21794,usec=40000,usec_per_call=1.84 cmdstat_hmget:calls=858,usec=5201,usec_per_call=6.06 cmdstat_info:calls=3,usec=177,usec_per_call=59.00 cmdstat_command:calls=84,usec=27778,usec_per_call=330.69 cmdstat_zrem:calls=141,usec=1652,usec_per_call=11.72 cmdstat_hdel:calls=6,usec=124,usec_per_call=20.67 cmdstat_incr:calls=1389,usec=5961,usec_per_call=4.29 cmdstat_zincrby:calls=1577,usec=39627,usec_per_call=25.13 cmdstat_mget:calls=1,usec=4,usec_per_call=4.00 cmdstat_zscore:calls=936,usec=5018,usec_per_call=5.36 cmdstat_hexists:calls=171,usec=1023,usec_per_call=5.98 cmdstat_hmset:calls=24631,usec=505707,usec_per_call=20.53 cmdstat_zrangebyscore:calls=3003,usec=62727,usec_per_call=20.89 cmdstat_scan:calls=8,usec=365,usec_per_call=45.62 cmdstat_sismember:calls=1772,usec=6990,usec_per_call=3.94 cmdstat_rpop:calls=600540888,usec=384324857,usec_per_call=0.64 cmdstat_set:calls=6250,usec=27941,usec_per_call=4.47 cmdstat_lrange:calls=2,usec=17,usec_per_call=8.50 cmdstat_lpush:calls=5494,usec=71400,usec_per_call=13.00 cmdstat_lpop:calls=2927495,usec=6608253,usec_per_call=2.26 cmdstat_srandmember:calls=15,usec=49,usec_per_call=3.27 cmdstat_ping:calls=5113973,usec=2539706,usec_per_call=0.50 cmdstat_del:calls=5056,usec=16581,usec_per_call=3.28 cmdstat_spop:calls=121,usec=409,usec_per_call=3.38 cmdstat_hset:calls=5940,usec=26639,usec_per_call=4.48 cmdstat_zrange:calls=7,usec=74,usec_per_call=10.57 cmdstat_setex:calls=93,usec=770,usec_per_call=8.28 cmdstat_expire:calls=381,usec=1087,usec_per_call=2.85 cmdstat_keys:calls=13,usec=24403,usec_per_call=1877.15 cmdstat_psync:calls=1,usec=2119,usec_per_call=2119.00 cmdstat_ttl:calls=1,usec=2,usec_per_call=2.00 cmdstat_auth:calls=4,usec=7,usec_per_call=1.75 cmdstat_zadd:calls=2142,usec=35541,usec_per_call=16.59 cmdstat_srem:calls=16,usec=142,usec_per_call=8.88 cmdstat_XXX: calls=XXX,usec=XXX,usec_per_call=XXX

       含义 命令执行的次数、命令耗费的 CPU 时间(单位毫秒)、执行每个命令耗费的平均 CPU 时间(单位毫秒)

    确认线上内存的设置是否合理

    意义:太重要了,毕竟redis还是主要在内存中使用

    redis-cli > config get maxmemory 1) "maxmemory" 2) "0" redis-cli > config get maxmemory-policy 1) "maxmemory-policy" 2) "noeviction"

    上面命令查看是否设置最大内存和内存不足时的逐出策略,遗憾的是都没有进行设置。

    当内存接近峰值时,一个好的逐出策略是非常重要的, 那就需要看看设定的淘汰策略

    Redis提供6种数据淘汰策略:

    volatile-lru:从已设置过期时间的内存数据集中挑选最近最少使用的数据 淘汰; volatile-ttl: 从已设置过期时间的内存数据集中挑选即将过期的数据 淘汰; volatile-random:从已设置过期时间的内存数据集中任意挑选数据 淘汰; allkeys-lru:从内存数据集中挑选最近最少使用的数据 淘汰; allkeys-random:从数据集中任意挑选数据 淘汰; no-enviction(驱逐):禁止驱逐数据。(默认淘汰策略。当redis内存数据达到

    redis-cli > config set maxmemory-policy volatile-lru OK redis-cli > config get maxmemory-policy 1) "maxmemory-policy" 2) "volatile-lru" redis-cli > config set maxmemory 24242358 OK redis-cli > config get maxmemory 1) "24242358"

    最近线上开始出现查询变慢和内存升高的迹象,该如何处理?

    意义:线上万事紧急,自然重要

    实战:

    redis-cli > monitor OK 1558862420.921783 [0 10.10.10.96:39663] "RPOP" "video:test:0" 1558862420.921922 [0 10.10.10.96:39651] "RPOP" "video:test:8" 1558862420.921929 [0 10.10.10.96:39657] "RPOP" "video:test:3" 1558862420.933309 [0 10.10.10.96:39651] "RPOP" "video:test:4" 1558862420.933359 [0 10.10.10.96:39657] "RPOP" "video:test:5"

    这绝对是一个大杀器,分析思路通常如下:

    若QPS增长,就看出现和执行频率最高的指令

    若查询变慢,就关注是否有那些可能耗时的命令经常出现

    若内存占用增高,则关注是否出现向5大数据结构 进行高频或大数据的添加插入操作

    当然,还能发现是否有一些禁忌指令在线上执行;作用非常大。

    服务即将上线预计对redis压力比较大,如何测下redis的性能?

    意义:知己知彼,百战不殆,自然重要

    实战:

    src/redis-benchmark -c 10 -n 100000 -q PING_INLINE: 51020.41 requests per second PING_BULK: 50607.29 requests per second SET: 37257.82 requests per second GET: 49800.80 requests per second INCR: 38699.69 requests per second LPUSH: 38910.51 requests per second LPOP: 39277.30 requests per second SADD: 54614.96 requests per second SPOP: 51948.05 requests per second LPUSH (needed to benchmark LRANGE): 38819.88 requests per second LRANGE_100 (first 100 elements): 20112.63 requests per second LRANGE_300 (first 300 elements): 9025.27 requests per second LRANGE_500 (first 450 elements): 6836.67 requests per second LRANGE_600 (first 600 elements): 5406.28 requests per second MSET (10 keys): 19394.88 requests per second

    redis自带一个压测命令,非常简便好用,可以对redis进行一个预估。

    到此,我们进行了一些简单实战总结,仍然无法概括所有,那么扫码关注公众号

    其中的Redis Deepen系列会进行从实战到原理的层层分析,部分文章如下:

    Redis Deepen系列招6---性能提高之批量

    Redis Deepen系列招9---List的实现原理和经典使用场景

    最新回复(0)