mysql too many connections 分析

    xiaoxiao2025-09-16  20

    现象

    实例出现too many connections

    ERROR 1040 (08004): Too many connections

    gdb修改max_connections后查看processlist, 有Waiting for backup lock,sql线程被阻塞,同时大量show slave status连接

    | 131945 | system user | | mysql | Connect | 302156 | Waiting for Slave Worker to release partition | NULL | | 131946 | system user | | NULL | Connect | 302832 | Waiting for an event from Coordinator | NULL | | 131947 | system user | | NULL | Connect | 381957 | Waiting for an event from Coordinator | NULL | | 131948 | system user | | NULL | Connect | 302167 | Waiting for an event from Coordinator | NULL | | 131949 | system user | | NULL | Connect | 302520 | Waiting for backup lock | NULL | | 131950 | system user | | NULL | Connect | 302531 | Waiting for backup lock | NULL | | 131951 | system user | | NULL | Connect | 302531 | Waiting for backup lock | NULL | | 131952 | system user | | NULL | Connect | 302537 | Waiting for backup lock | NULL | | 131953 | system user | | NULL | Connect | 302554 | Waiting for backup lock | NULL | | 187069 | root | 127.0.0.1:49991 | NULL | Sleep | 9 | | NULL | | 211141 | root | 127.0.0.1:49251 | NULL | Query | 297261 | init | show slave status for channel '' | | 245974 | root | 127.0.0.1:48726 | NULL | Query | 297194 | init | SHOW SLAVE STATUS | | 247341 | aurora | 10.143.33.57:36949 | NULL | Query | 297336 | Killing slave | stop slave | | 247346 | root | 127.0.0.1:58466 | NULL | Killed | 297335 | init | show slave status | | 247349 | root | 127.0.0.1:58565 | NULL | Killed | 297327 | init

    查看存在备份进程

    root 86809 86803 0 May14 ? 00:00:00 innobackupex --defaults-file=/etc/my.cnf ......

    分析

    我们引入了percona 的Backup Locks方案,备份会执行LOCK TABLES FOR BACKUP

    pt-pmt 分析线程堆栈信息,

    show slave status等待LOCK_msr_map

    __lll_lock_wait(libpthread.so.0),_L_lock_995(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),inline_mysql_mutex_lock(mysql_thread.h:690),show_slave_status_cmd(mysql_thread.h:690),mysql_execute_command(sql_parse.cc:3347),mysql_parse(sql_parse.cc:7158),dispatch_command(sql_parse.cc:1597),do_handle_one_connection(sql_connect.cc:1006),handle_one_connection(sql_connect.cc:922),start_thread(libpthread.so.0),clone(libc.so.6)

    stop slave持有LOCK_msr_map等待stop_cond io和sql退出

    1 pthread_cond_timedwait,inline_mysql_cond_timedwait(mysql_thread.h:1199),terminate_slave_thread(mysql_thread.h:1199),terminate_slave_thread(rpl_slave.cc:1268),terminate_slave_threads(rpl_slave.cc:1268),terminate_slave_threads(rpl_slave.cc:9768),stop_slave(rpl_slave.cc:9768),stop_slave(rpl_slave.cc:611),stop_slave_cmd(rpl_slave.cc:756),mysql_execute_command(sql_parse.cc:3707),mysql_parse(sql_parse.cc:7158),dispatch_command(sql_parse.cc:1597),do_handle_one_connection(sql_connect.cc:1006),handle_one_connection(sql_connect.cc:922),start_thread(libpthread.so.0),clone(libc.so.6)

    sql线程等待 worker线程执行完事务( slave_worker_hash_cond)

    1 pthread_cond_wait,inline_mysql_cond_wait(mysql_thread.h:1162),wait_for_workers_to_finish(mysql_thread.h:1162),slave_stop_workers(rpl_slave.cc:6471),handle_slave_sql(rpl_slave.cc:6997),start_thread(libpthread.so.0),clone(libc.so.6)

    worker等待backup_tables_lock 锁

    pthread_cond_timedwait,inline_mysql_cond_timedwait(mysql_thread.h:1199),MDL_wait::timed_wait(mysql_thread.h:1199),MDL_context::acquire_lock(mdl.cc:2416),Global_backup_lock::acquire_protection(lock.cc:1221),open_table(sql_base.cc:3173),open_and_process_table(sql_base.cc:4630),open_tables(sql_base.cc:4630),open_and_lock_tables(sql_base.cc:5735),open_and_lock_tables(sql_base.h:476),Rows_log_event::do_apply_event(sql_base.h:476),slave_worker_exec_job(rpl_rli_pdb.cc:2061),handle_slave_worker(rpl_slave.cc:5696),start_thread(libpthread.so.0),clone(libc.so.6)

    而我们备份又持有backup_tables_lock锁

    以锁等待依赖顺序导致大量的show slave status被阻塞,从而占满root连接

    修复方法

    可以通过kill备份的方式修复

    如何避免

    1 尽量不要使用myisam,减少备份持有LOCK TABLES FOR BACKUP的时间。本例中myisam有200多个

    2 备份期间尽量不要执行stop slave操作。

    最新回复(0)