http://dev.mysql.com/doc/refman/5.7/en/replication-options-gtids.html#sysvar_gtid_next
三种取值 * AUTOMATIC: Use the next automatically-generated global transaction ID. * ANONYMOUS: Transactions do not have global identifiers, and are identified by file and position only. * A global transaction ID in UUID:NUMBER format. QA: GTID 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50 对应的事务顺序,从小到大,一定是顺序执行的吗?答案:错,一般情况下事务是从小到大,顺序执行的。 但是如果再MTS场景,或者是人工设置gtid_next的情况下,就可能不是顺序执行了
dba:(none)> show master status; +--------------------+----------+--------------+------------------+-------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +--------------------+----------+--------------+------------------+-------------------------------------------+ | xx.000009 | 1719 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-46 | +--------------------+----------+--------------+------------------+-------------------------------------------+ 1 row in set (0.00 sec) dba:(none)> set gtid_next='0923e916-3c36-11e6-82a5-ecf4bbf1f518:50'; Query OK, 0 rows affected (0.00 sec) dba:lc> insert into gtid_1 values(5); Query OK, 1 row affected (0.00 sec) dba:lc> set gtid_next=AUTOMATIC; Query OK, 0 rows affected (0.00 sec) dba:lc> flush logs; Query OK, 0 rows affected (0.01 sec) dba:lc> show master status; +--------------------+----------+--------------+------------------+----------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +--------------------+----------+--------------+------------------+----------------------------------------------+ | xx.000010 | 210 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-46:50 | +--------------------+----------+--------------+------------------+----------------------------------------------+ 1 row in set (0.00 sec) dba:lc> insert into gtid_1 values(6); Query OK, 1 row affected (0.00 sec) dba:lc> insert into gtid_1 values(6); Query OK, 1 row affected (0.00 sec) dba:lc> insert into gtid_1 values(6); Query OK, 1 row affected (0.00 sec) dba:lc> show master status; +--------------------+----------+--------------+------------------+-------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +--------------------+----------+--------------+------------------+-------------------------------------------+ | xx.000010 | 1125 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50 | +--------------------+----------+--------------+------------------+-------------------------------------------+ 1 row in set (0.00 sec) 在这里面,很明显0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50 事务执行顺序为: 1-46(最先执行) , 50(其次执行) , 47-49(最后执行)请参考MHA源码解析
GTID模式下,需要relay-log吗?purge_relay_log设置为on可以吗? * replication 架构 host_1(host_1:3306) (current master) +--host_2(host_2:3306 candidate master) +--host_3(host_3:3306 no candidate) * 模拟: 1. 大量并发的写入,一直持续的往host_1写数据,造成并发写入很大的样子 2. host_2: stop slave , 造成host_2 延迟master很多的样子 3. host_1: purge binary logs, 造成master删掉了日志,导致host_2 修复的时候拿不到master的最新binlog 4. host_3: 一直正常同步master,拥有最新的binlog 5. host_3: flush logs; purge_relay_log=on; flush logs;一直循环flush logs,造成host_3已经将最新的relay log删掉了,host_2 是肯定拿不到host_3的relay 来修复自己了 6. 好了,一切条件均已经准备完毕,这个时候让master 宕机,这样就能模拟出在relay log没有的情况下,是否可以正常完成mha 切换了 ............... 7. 结果完成了正常切换,那mha是怎么再gtid模式下,在没有relay log的情况下,正常切换的恩? 8. 原理:host_2发现自己不是最新的slave,所以就去change master到host_3,通过host_3的binlog来恢复 9. 最后,当host_2和host_3都一致的情况下,再让host_3 重新指向host_2,完毕... *结论: gtid模式下,mha恢复切换的原理是不需要relay log的,只需要binlog物理备份:xtrabackup,其他等逻辑备份:mysqldump,mydumper,mysqlpump等
物理备份 备份的时候,只要在备份的时候记录下Executed_Gtid_Set($gtid_dump)即可,这个可以用于重新change master; reset master; SET @@GLOBAL.GTID_PURGED='$gtid_dump'; change master to master_auto_position=1; 逻辑备份 * mysqldump 中 sql_log_bin 默认是关闭的。 SET @@SESSION.SQL_LOG_BIN= 0; 所以这里用途非常重要 * 如果dump文件,你要在master上执行,那么必须这样备份: mysqldump xx --set-gtid-purged=OFF , 这样dump文件不会有SET @@SESSION.SQL_LOG_BIN= 0存在 * 如果dump文件,你要在slave上执行,想重新搭建一套slave环境。那么必须这样备份: mysqldump xx --set-gtid-purged=ONslave relay log 不完整怎么办?(relay-log-recover=0)relay-log-recover=1 不考虑,因为它会舍弃掉relay log
为何要讨论这个 * 官方解释: 1) 非GTID模式下,如何保证slave crash safe 呢? relay_log_recovery=1,relay_log_info_repository=TABLE,master_info_repository=TABLE,innodb_flush_log_at_trx_commit=1,sync_binlog=1 2) GTID模式下,如何保证slave crash safe呢? relay_log_recovery=(1|0),relay_log_info_repository=TABLE,master_info_repository=TABLE,innodb_flush_log_at_trx_commit=1,sync_binlog=1 以上两种情况配置,可以保证crash safe 这里看到区别就是relay_log_recovery了,gtid可以是any,这就需要讨论下了。 当relay_log_recovery=1时,当mysql crash的时候,会丢弃掉之前获取的relay,所以这个不会产生一致性问题。 当relay_log_recovery=0时 如果是非GTID模式,因为没办法保证写master_info.log和relay log file之间的原子性,会导致slave有可能多拉取一个事务,这样就有一致性问题。 如果是GTID模式,因为binlog-dump协议变了,master_info.log已经不用,slave会将已经exected_GTID与retrieve_gtid的并集发送给master,以此来获取没有执行过的gtid,所以没问题。 这里面的retrieve_gtid就是IO_thread从master获取的gtid,会写入到relay log。 模拟relay log不完整的情况从上面可以知道,relay log的记录非常重要,那么relay log 不完整,会怎么样呢?
1) master 创建一张10G的表,然后执行全表更新操作。 2)这时候,slave就在狂写relay log了 3)此时,去slave kill掉mysql进程 4)这时候,relay log就不完整了 WARNING: The range of printed events ends with a row event or a table map event that does not have the STMT_END_F flag set. This might be because the last statement was not fully written to the log, or because you are using a --stop-position or --stop-datetime that refers to an event in the middle of a statement. The event(s) from the partial statement have not been written to output.总结: relay log不完整,mysql起来后,会重新获取不完整的这个events,sql_thread在回放的时候,如果发现events不完整,会跳过,不会影响到同步。
MTS_GAPS
如果MTS遇到Gap transction怎么办? 1. 先解决问题 START SLAVE UNTIL SQL_AFTER_MTS_GAPS 2. 考虑设置slave_preserve_commit_order=1Migration to GTID replicationNon transactionally safe statement will raise errors nowMySQL Performance in GTIDmysql_upgrade scriptErrant transactionsFiltration on the slaveInjecting empty transactions以上问题请参考 GTID原理与实战
online升级丢数据?online升级会报错吗?online升级步骤请参考 GTID原理与实战
故障案例一Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Cannot replicate anonymous transaction when @@GLOBAL.GTID_MODE = ON ...'
两种情况: 1)slave的gtid_mode=on时,却还接受着来自master的non-gtid transaction的时候,会报以上错误。 2)事实上,不管slave的gtid_mode是on,还是off,只要master的gtid_mode=on,那么整个replication slave,都必须是gtid的事务 解决方案:在master上从gtid_mode=ON_PERMISSIVE 设置到 gtid_mode=ON之前,如何保证现在所有non-gtid事务都已经在slave执行完毕了? 很简单,两种方法: 第一种方案: 1) 在master上,当设置gtid_mode=ON_PERMISSIVE的时候,其实就已经产生gtid事务了,这个时候show master status;记下这个位置 $pos 2)然后再每个slave上,执行 SELECT MASTER_POS_WAIT(file, position); 第二种更加直接方案: 0)默认情况下,slave的gtid_mode都是off,所以去slave上show master status 都应该是file,position 1) 先在master上,设置gtid_mode=ON_PERMISSIVE 2)然后再每台slave上再次执行show master status,如果发现结果由file,position 变成 GTID_EXECUTED,那么说明slave已经将non-gtid全部执行完毕了 故障案例二Last_IO_Error: The replication receiver thread cannot start because the master has GTID_MODE = ON and this server has GTID_MODE = OFF.
slave的gtid_mode=off时,却还接受着来自master的gtid transaction的时候,会报以上错误。mysqlbinlog 参数:
* --exclude-gtids : 排除这些gtid * --include-gtids : 只打印这些gtid * --skip-gtids : 所有gtid都不打印 可以用--skip-gtids 做传统模式的恢复。但是这个是官方不推荐的。 mysqlbinlog --skip-gtids binlog.000001 > /tmp/dump.sqlgtid_set 用引号扩起来
NameDescriptionGTID_SUBSET(subset,set)returns true (1) if all GTIDs in subset are also in setGTID_SUBTRACT(set,subset)returns only those GTIDs from set that are not in subsetWAIT_FOR_EXECUTED_GTID_SET(gtid_set[, timeout])Wait until the given GTIDs have executed on slave.WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS(gtid_set, timeout)Wait until the given GTIDs have executed on slave GTID_SUBSET(subset,set)subset 是否是 set 的子集,如果是返回1,不是返回0
dba:(none)> SELECT GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57'); +-----------------------------------------------------------------------------------------------------+ | GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57') | +-----------------------------------------------------------------------------------------------------+ | 1 | +-----------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:(none)> SELECT GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23-25','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57'); +--------------------------------------------------------------------------------------------------------+ | GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23-25','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57') | +--------------------------------------------------------------------------------------------------------+ | 1 | +--------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:(none)> SELECT GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23','3E11FA47-71CA-11E1-9E33-C80AA9429562:23'); +--------------------------------------------------------------------------------------------------+ | GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23','3E11FA47-71CA-11E1-9E33-C80AA9429562:23') | +--------------------------------------------------------------------------------------------------+ | 1 | +--------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:(none)> SELECT GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57'); +--------------------------------------------------------------------------------------------------------+ | GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57') | +--------------------------------------------------------------------------------------------------------+ | 0 | +--------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) GTID_SUBTRACT(set,subset)哪些gtids仅仅是set独有的,subset没有的
dba:(none)> SELECT GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:21'); +-------------------------------------------------------------------------------------------------------+ | GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:21') | +-------------------------------------------------------------------------------------------------------+ | 3e11fa47-71ca-11e1-9e33-c80aa9429562:22-57 | +-------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:(none)> SELECT GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25'); +----------------------------------------------------------------------------------------------------------+ | GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25') | +----------------------------------------------------------------------------------------------------------+ | 3e11fa47-71ca-11e1-9e33-c80aa9429562:26-57 | +----------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:(none)> SELECT GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:23-24'); +----------------------------------------------------------------------------------------------------------+ | GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:23-24') | +----------------------------------------------------------------------------------------------------------+ | 3e11fa47-71ca-11e1-9e33-c80aa9429562:21-22:25-57 | +----------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)以上两个函数可以用来干嘛呢?通过GTID_SUBSET,master可以知道slave是否是自己的子集,可以很方便的检查数据一致性通过GTID_SUBTRACT,假设slave是master的子集,那么可以很轻松的将slave没有,master有的gtid发送给slave,以便达到最终一致性
WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS(gtid_set, timeout)timeout 默认为0,表示无限等待slave gtid_set全部执行完毕如果全部执行完毕,会返回执行的gtid的数量。如果没有执行完,会等待timeout秒。如果slave没有起来,或者没有开启gtid,会返回NULL
dba:lc> SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3'); +---------------------------------------------------------------------------------+ | WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3',1) | +---------------------------------------------------------------------------------+ | 0 | +---------------------------------------------------------------------------------+ 1 row in set (0.00 sec) stop slave; dba:lc> SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3'); +---------------------------------------------------------------------------------+ | WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3',1) | +---------------------------------------------------------------------------------+ | NULL | ## 如果slave的IO,SQL thread 没有running,返回NULL,不管gtid set 有木有执行完毕 +---------------------------------------------------------------------------------+ 1 row in set (0.00 sec) WAIT_FOR_EXECUTED_GTID_SET(gtid_set[, timeout])含义跟WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS一样,唯一一个区别就是:如果slave 的replication 线程没有起来,不会返回NULL。
stop slave; dba:lc> SELECT WAIT_FOR_EXECUTED_GTID_SET('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3'); +------------------------------------------------------------------------+ | WAIT_FOR_EXECUTED_GTID_SET('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3') | +------------------------------------------------------------------------+ | 0 | ## 如果都执行了,返回0 , 跟slave的IO,SQL thread 起没起来无关 +------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:lc> SELECT WAIT_FOR_EXECUTED_GTID_SET('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-4',1); +--------------------------------------------------------------------------+ | WAIT_FOR_EXECUTED_GTID_SET('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-4',1) | +--------------------------------------------------------------------------+ | 1 | +--------------------------------------------------------------------------+ 1 row in set (1.00 sec)