下面是MHA从启动,到failover的完整manager.log,以及集群中所有mysql的general.log.我们可以通过这些日志,更多的了解和分析MHA的运行原理.对用好MHA有很多帮助。
架构如下:
DB1 MASTER
DB2 SLAVE,candidate_master
DB3 SLAVE
DB1:
[root@localhost mysql]# cat /var/lib/mysql/localhost.log
140928 10:17:04 5 Connect [email protected] on
5 Query set autocommit=1
5 Query SELECT CONNECTION_ID() AS Value
6 Connect [email protected] on
6 Query set autocommit=1
6 Query SELECT CONNECTION_ID() AS Value
6 Query SET wait_timeout=86400
6 Query SELECT @@global.server_id As Value
6 Query SELECT VERSION() AS Value
6 Query SHOW GLOBAL VARIABLES LIKE 'log_bin'
6 Query SHOW MASTER STATUS
6 Query SELECT @@global.relay_log_info_repository AS Value
6 Query SELECT @@global.datadir AS Value
6 Query SELECT @@global.relay_log_info_file AS Value
6 Query SELECT @@global.datadir AS Value
6 Query SHOW SLAVE STATUS
6 Query SELECT @@global.read_only As Value
6 Query SELECT @@global.relay_log_purge As Value
140928 10:17:11 7 Connect [email protected] on
7 Query select @@version_comment limit 1
7 Query select count(*) as c from mysql.user
6 Quit
7 Quit
8 Connect [email protected] on
8 Query set autocommit=1
8 Query SET wait_timeout=6
8 Query SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', '12') AS Value
8 Query SELECT 1 As Value
140928 10:17:14 8 Query SELECT 1 As Value
140928 10:17:17 8 Query SELECT 1 As Value
140928 10:17:20 8 Query SELECT 1 As Value
140928 10:17:23 8 Query SELECT 1 As Value
140928 10:17:26 8 Query SELECT 1 As Value
140928 10:17:29 8 Query SELECT 1 As Value
140928 10:17:32 8 Query SELECT 1 As Value
140928 10:17:35 8 Query SELECT 1 As Value
140928 10:17:38 8 Query SELECT 1 As Value
DB2:
[root@localhost mysql]# cat /var/lib/mysql/localhost.log
140928 10:17:04 214 Connect [email protected] on
214 Query set autocommit=1
214 Query SELECT CONNECTION_ID() AS Value
215 Connect [email protected] on
215 Query set autocommit=1
215 Query SELECT CONNECTION_ID() AS Value
215 Query SET wait_timeout=86400
215 Query SELECT @@global.server_id As Value
215 Query SELECT VERSION() AS Value
215 Query SHOW GLOBAL VARIABLES LIKE 'log_bin'
215 Query SHOW MASTER STATUS
215 Query SELECT @@global.relay_log_info_repository AS Value
215 Query SELECT @@global.datadir AS Value
215 Query SELECT @@global.relay_log_info_file AS Value
215 Query SELECT @@global.datadir AS Value
215 Query SHOW SLAVE STATUS
215 Query SELECT @@global.read_only As Value
215 Query SELECT @@global.relay_log_purge As Value
215 Query SHOW SLAVE STATUS
215 Query SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'slave'
140928 10:17:09 216 Connect [email protected] on
216 Query select @@version_comment limit 1
216 Query set sql_log_bin=0
216 Query create table if not exists mysql.apply_diff_relay_logs_test(id int)
216 Query insert into mysql.apply_diff_relay_logs_test values(1)
216 Query update mysql.apply_diff_relay_logs_test set id=id+1 where id=1
216 Query delete from mysql.apply_diff_relay_logs_test
216 Query drop table mysql.apply_diff_relay_logs_test
140928 10:17:10 216 Quit
140928 10:17:11 215 Quit
140928 10:17:50 217 Connect [email protected] on
217 Query set autocommit=1
217 Query SELECT CONNECTION_ID() AS Value
218 Connect [email protected] on
218 Query set autocommit=1
218 Query SELECT CONNECTION_ID() AS Value
218 Query SET wait_timeout=86400
218 Query SELECT @@global.server_id As Value
218 Query SELECT VERSION() AS Value
218 Query SHOW GLOBAL VARIABLES LIKE 'log_bin'
218 Query SHOW MASTER STATUS
218 Query SELECT @@global.relay_log_info_repository AS Value
218 Query SELECT @@global.datadir AS Value
218 Query SELECT @@global.relay_log_info_file AS Value
218 Query SELECT @@global.datadir AS Value
218 Query SHOW SLAVE STATUS
218 Query SELECT @@global.read_only As Value
218 Query SELECT @@global.relay_log_purge As Value
218 Query SHOW SLAVE STATUS
219 Connect [email protected] on
219 Query set autocommit=1
219 Query SELECT CONNECTION_ID() AS Value
220 Connect [email protected] on
220 Query set autocommit=1
220 Query SELECT CONNECTION_ID() AS Value
220 Query SET wait_timeout=86400
220 Query SELECT @@global.server_id As Value
220 Query SELECT VERSION() AS Value
220 Query SHOW GLOBAL VARIABLES LIKE 'log_bin'
220 Query SHOW MASTER STATUS
220 Query SELECT @@global.relay_log_info_repository AS Value
220 Query SELECT @@global.datadir AS Value
220 Query SELECT @@global.relay_log_info_file AS Value
220 Query SELECT @@global.datadir AS Value
220 Query SHOW SLAVE STATUS
220 Query SELECT @@global.read_only As Value
220 Query SELECT @@global.relay_log_purge As Value
220 Query SHOW SLAVE STATUS
220 Query SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '1') AS Value
220 Query SHOW SLAVE STATUS
220 Query STOP SLAVE IO_THREAD
220 Query SHOW SLAVE STATUS
140928 10:17:51 220 Query SHOW SLAVE STATUS
220 Query SHOW SLAVE STATUS
220 Query STOP SLAVE SQL_THREAD
220 Query SHOW SLAVE STATUS
220 Query SHOW MASTER STATUS
220 Query SELECT @@global.read_only As Value
221 Connect [email protected] on
221 Query SELECT UNIX_TIMESTAMP()
221 Query SHOW VARIABLES LIKE 'SERVER_ID'
221 Query SET @master_heartbeat_period= 1799999979520
221 Query SET @master_binlog_checksum= @@global.binlog_checksum
221 Query SELECT @master_binlog_checksum
221 Query SELECT @@GLOBAL.GTID_MODE
221 Query SHOW VARIABLES LIKE 'SERVER_UUID'
221 Query SET @slave_uuid= 'dda2f003-4244-11e4-b851-000c29da163f'
221 Binlog Dump Log: 'mysql-bin.000016' Pos: 688
220 Query STOP SLAVE
220 Query SHOW SLAVE STATUS
220 Query RESET SLAVE /*!50516 ALL */
220 Query SHOW SLAVE STATUS
220 Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value
220 Quit
DB3:
[root@localhost ~]# cat /var/lib/mysql/localhost.log
140928 10:17:04 223 Connect [email protected] on
223 Query set autocommit=1
223 Query SELECT CONNECTION_ID() AS Value
224 Connect [email protected] on
224 Query set autocommit=1
224 Query SELECT CONNECTION_ID() AS Value
224 Query SET wait_timeout=86400
224 Query SELECT @@global.server_id As Value
224 Query SELECT VERSION() AS Value
224 Query SHOW GLOBAL VARIABLES LIKE 'log_bin'
224 Query SHOW MASTER STATUS
224 Query SELECT @@global.relay_log_info_repository AS Value
224 Query SELECT @@global.datadir AS Value
224 Query SELECT @@global.relay_log_info_file AS Value
224 Query SELECT @@global.datadir AS Value
224 Query SHOW SLAVE STATUS
224 Query SELECT @@global.read_only As Value
224 Query SELECT @@global.relay_log_purge As Value
224 Query SHOW SLAVE STATUS
140928 10:17:10 225 Connect [email protected] on
225 Query select @@version_comment limit 1
225 Query set sql_log_bin=0
225 Query create table if not exists mysql.apply_diff_relay_logs_test(id int)
225 Query insert into mysql.apply_diff_relay_logs_test values(1)
225 Query update mysql.apply_diff_relay_logs_test set id=id+1 where id=1
225 Query delete from mysql.apply_diff_relay_logs_test
225 Query drop table mysql.apply_diff_relay_logs_test
140928 10:17:11 225 Quit
224 Quit
#开始failover流程的监测
140928 10:17:50 226 Connect [email protected] on
226 Query set autocommit=1
226 Query SELECT CONNECTION_ID() AS Value
227 Connect [email protected] on
227 Query set autocommit=1
227 Query SELECT CONNECTION_ID() AS Value
227 Query SET wait_timeout=86400
227 Query SELECT @@global.server_id As Value
227 Query SELECT VERSION() AS Value
227 Query SHOW GLOBAL VARIABLES LIKE 'log_bin'
227 Query SHOW MASTER STATUS
227 Query SELECT @@global.relay_log_info_repository AS Value
227 Query SELECT @@global.datadir AS Value
227 Query SELECT @@global.relay_log_info_file AS Value
227 Query SELECT @@global.datadir AS Value
227 Query SHOW SLAVE STATUS
227 Query SELECT @@global.read_only As Value
227 Query SELECT @@global.relay_log_purge As Value
227 Query SHOW SLAVE STATUS
#正式开始failover的监测
228 Connect [email protected] on
228 Query set autocommit=1
228 Query SELECT CONNECTION_ID() AS Value
229 Connect [email protected] on
229 Query set autocommit=1
229 Query SELECT CONNECTION_ID() AS Value
229 Query SET wait_timeout=86400
229 Query SELECT @@global.server_id As Value
229 Query SELECT VERSION() AS Value
229 Query SHOW GLOBAL VARIABLES LIKE 'log_bin'
229 Query SHOW MASTER STATUS
229 Query SELECT @@global.relay_log_info_repository AS Value
229 Query SELECT @@global.datadir AS Value
229 Query SELECT @@global.relay_log_info_file AS Value
229 Query SELECT @@global.datadir AS Value
229 Query SHOW SLAVE STATUS
229 Query SELECT @@global.read_only As Value
229 Query SELECT @@global.relay_log_purge As Value
229 Query SHOW SLAVE STATUS
229 Query SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '1') AS Value
229 Query SHOW SLAVE STATUS
229 Query STOP SLAVE IO_THREAD
229 Query SHOW SLAVE STATUS
140928 10:17:51 229 Query SHOW SLAVE STATUS
229 Query SHOW SLAVE STATUS
229 Query SHOW SLAVE STATUS
229 Query SHOW SLAVE STATUS
229 Query STOP SLAVE SQL_THREAD
229 Query SHOW SLAVE STATUS
229 Query STOP SLAVE
229 Query SHOW SLAVE STATUS
229 Query RESET SLAVE
229 Query CHANGE MASTER TO MASTER_HOST = '192.168.153.151' MASTER_USER = 'slave' MASTER_PASSWORD = MASTER_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000016' MASTER_LOG_POS = 688
229 Query START SLAVE
230 Connect Out [email protected]:3306
229 Query SHOW SLAVE STATUS
229 Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value
229 Quit
140928 10:18:07 236 Connect root@localhost on
236 Query select @@version_comment limit 1
236 Query select USER()
140928 10:18:10 236 Quit
MHA Manage.log:
[root@localhost ~]# cat /var/log/masterha/app1/manager.log
#启动前的准备工作
#检查服务器状态,获取相关参数设置
Sun Sep 28 10:17:03 2014 - [info] MHA::MasterMonitor version 0.55.
Sun Sep 28 10:17:04 2014 - [info] Dead Servers:
Sun Sep 28 10:17:04 2014 - [info] Alive Servers:
Sun Sep 28 10:17:04 2014 - [info] 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:04 2014 - [info] 192.168.153.151(192.168.153.151:3306)
Sun Sep 28 10:17:04 2014 - [info] 192.168.153.152(192.168.153.152:3306)
Sun Sep 28 10:17:04 2014 - [info] Alive Slaves:
Sun Sep 28 10:17:04 2014 - [info] 192.168.153.151(192.168.153.151:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled
Sun Sep 28 10:17:04 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:04 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Sep 28 10:17:04 2014 - [info] 192.168.153.152(192.168.153.152:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled
Sun Sep 28 10:17:04 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:04 2014 - [info] Not candidate for the new Master (no_master is set)
Sun Sep 28 10:17:04 2014 - [info] Current Alive Master: 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:04 2014 - [info] Checking slave configurations..
Sun Sep 28 10:17:04 2014 - [info] read_only=1 is not set on slave 192.168.153.151(192.168.153.151:3306).
Sun Sep 28 10:17:04 2014 - [warning] relay_log_purge=0 is not set on slave 192.168.153.151(192.168.153.151:3306).
Sun Sep 28 10:17:04 2014 - [info] read_only=1 is not set on slave 192.168.153.152(192.168.153.152:3306).
Sun Sep 28 10:17:04 2014 - [warning] relay_log_purge=0 is not set on slave 192.168.153.152(192.168.153.152:3306).
Sun Sep 28 10:17:04 2014 - [info] Checking replication filtering settings..
Sun Sep 28 10:17:04 2014 - [info] binlog_do_db= , binlog_ignore_db=
Sun Sep 28 10:17:04 2014 - [info] Replication filtering check ok.
#测试ssh连接是否成功
Sun Sep 28 10:17:04 2014 - [info] Starting SSH connection tests..
Sun Sep 28 10:17:07 2014 - [info] All SSH connection tests passed successfully.
#测试MHA node是否可用
Sun Sep 28 10:17:07 2014 - [info] Checking MHA Node version..
Sun Sep 28 10:17:08 2014 - [info] Version check ok.
Sun Sep 28 10:17:08 2014 - [info] Checking SSH publickey authentication settings on the current master..
Sun Sep 28 10:17:08 2014 - [info] HealthCheck: SSH to 192.168.153.150 is reachable.
Sun Sep 28 10:17:09 2014 - [info] Master MHA Node version is 0.54.
Sun Sep 28 10:17:09 2014 - [info] Checking recovery script configurations on the current master..
Sun Sep 28 10:17:09 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000026
#创建MHA日志目录
Sun Sep 28 10:17:09 2014 - [info] Connecting to [email protected](192.168.153.150)..
Creating /var/log/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000026
Sun Sep 28 10:17:09 2014 - [info] Master setting check done.
#master的检查到此完毕
#开始检查slave的差异日志应用权限
Sun Sep 28 10:17:09 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun Sep 28 10:17:09 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.153.151 --slave_ip=192.168.153.151 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.6.20-68.0-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Sun Sep 28 10:17:09 2014 - [info] Connecting to [email protected](192.168.153.151:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysql-relay.000003
Temporary relay log file is /var/lib/mysql/mysql-relay.000003
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sun Sep 28 10:17:10 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.153.152 --slave_ip=192.168.153.152 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.6.20-68.0-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Sun Sep 28 10:17:10 2014 - [info] Connecting to [email protected](192.168.153.152:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysql-relay.000003
Temporary relay log file is /var/lib/mysql/mysql-relay.000003
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sun Sep 28 10:17:11 2014 - [info] Slaves settings check done.
Sun Sep 28 10:17:11 2014 - [info]
192.168.153.150 (current master)
+--192.168.153.151
+--192.168.153.152
#调试master_ip_failover_script
Sun Sep 28 10:17:11 2014 - [warning] master_ip_failover_script is not defined.
#调试shutdown_script
Sun Sep 28 10:17:11 2014 - [info] Checking shutdown script status:
Sun Sep 28 10:17:11 2014 - [info] /opt/master_ip_failover.sh --command=status --ssh_user=root --host=192.168.153.150 --ip=192.168.153.150
Sun Sep 28 10:17:11 2014 - [info] OK.
Sun Sep 28 10:17:11 2014 - [info] Set master ping interval 3 seconds.
#设置二次检查的主机
Sun Sep 28 10:17:11 2014 - [info] Set secondary check script: masterha_secondary_check -s 192.168.153.151 -s 192.168.153.152
Sun Sep 28 10:17:11 2014 - [info] Starting ping health check on 192.168.153.150(192.168.153.150:3306)..
#MHA启动完毕,进入监测状态
Sun Sep 28 10:17:11 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
#监测DB1服务器挂了
Sun Sep 28 10:17:41 2014 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
#通过定义的二次监测,确认master是否挂了
Sun Sep 28 10:17:41 2014 - [info] Executing seconary network check script: masterha_secondary_check -s 192.168.153.151 -s 192.168.153.152 --user=root --master_host=192.168.153.150 --master_ip=192.168.153.150 --master_port=3306
Sun Sep 28 10:17:41 2014 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.55 --binlog_prefix=mysql-bin
Monitoring server 192.168.153.151 is reachable, Master is not reachable from 192.168.153.151. OK.
Sun Sep 28 10:17:42 2014 - [info] HealthCheck: SSH to 192.168.153.150 is reachable.
Monitoring server 192.168.153.152 is reachable, Master is not reachable from 192.168.153.152. OK.
#确认master挂了,开始进入failover流程
#再试尝试连接master和master的ssh
Sun Sep 28 10:17:42 2014 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Sun Sep 28 10:17:44 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sun Sep 28 10:17:44 2014 - [warning] Connection failed 1 time(s)..
Sun Sep 28 10:17:47 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sun Sep 28 10:17:47 2014 - [warning] Connection failed 2 time(s)..
Sun Sep 28 10:17:50 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sun Sep 28 10:17:50 2014 - [warning] Connection failed 3 time(s)..
Sun Sep 28 10:17:50 2014 - [warning] Master is not reachable from health checker!
Sun Sep 28 10:17:50 2014 - [warning] Master 192.168.153.150(192.168.153.150:3306) is not reachable!
Sun Sep 28 10:17:50 2014 - [warning] SSH is reachable.
#通过配置文件,监测其他slave的状态
Sun Sep 28 10:17:50 2014 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Sun Sep 28 10:17:50 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Sep 28 10:17:50 2014 - [info] Reading application default configurations from /etc/app1.cnf..
Sun Sep 28 10:17:50 2014 - [info] Reading server configurations from /etc/app1.cnf..
Sun Sep 28 10:17:50 2014 - [info] Dead Servers:
Sun Sep 28 10:17:50 2014 - [info] 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:50 2014 - [info] Alive Servers:
Sun Sep 28 10:17:50 2014 - [info] 192.168.153.151(192.168.153.151:3306)
Sun Sep 28 10:17:50 2014 - [info] 192.168.153.152(192.168.153.152:3306)
Sun Sep 28 10:17:50 2014 - [info] Alive Slaves:
Sun Sep 28 10:17:50 2014 - [info] 192.168.153.151(192.168.153.151:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled
Sun Sep 28 10:17:50 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:50 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Sep 28 10:17:50 2014 - [info] 192.168.153.152(192.168.153.152:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled
Sun Sep 28 10:17:50 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:50 2014 - [info] Not candidate for the new Master (no_master is set)
#再次监测slave的配置是否有变化,是否符合failover条件
Sun Sep 28 10:17:50 2014 - [info] Checking slave configurations..
Sun Sep 28 10:17:50 2014 - [info] read_only=1 is not set on slave 192.168.153.151(192.168.153.151:3306).
Sun Sep 28 10:17:50 2014 - [warning] relay_log_purge=0 is not set on slave 192.168.153.151(192.168.153.151:3306).
Sun Sep 28 10:17:50 2014 - [info] read_only=1 is not set on slave 192.168.153.152(192.168.153.152:3306).
Sun Sep 28 10:17:50 2014 - [warning] relay_log_purge=0 is not set on slave 192.168.153.152(192.168.153.152:3306).
Sun Sep 28 10:17:50 2014 - [info] Checking replication filtering settings..
Sun Sep 28 10:17:50 2014 - [info] Replication filtering check ok.
Sun Sep 28 10:17:50 2014 - [info] Master is down!
Sun Sep 28 10:17:50 2014 - [info] Terminating monitoring script.
Sun Sep 28 10:17:50 2014 - [info] Got exit code 20 (Master dead).
Sun Sep 28 10:17:50 2014 - [info] MHA::MasterFailover version 0.55.
#正式开始failover
Sun Sep 28 10:17:50 2014 - [info] Starting master failover.
Sun Sep 28 10:17:50 2014 - [info]
#再次对slave配置做检查
Sun Sep 28 10:17:50 2014 - [info] * Phase 1: Configuration Check Phase..
Sun Sep 28 10:17:50 2014 - [info]
Sun Sep 28 10:17:50 2014 - [info] Dead Servers:
Sun Sep 28 10:17:50 2014 - [info] 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:50 2014 - [info] Checking master reachability via mysql(double check)..
Sun Sep 28 10:17:50 2014 - [info] ok.
Sun Sep 28 10:17:50 2014 - [info] Alive Servers:
Sun Sep 28 10:17:50 2014 - [info] 192.168.153.151(192.168.153.151:3306)
Sun Sep 28 10:17:50 2014 - [info] 192.168.153.152(192.168.153.152:3306)
Sun Sep 28 10:17:50 2014 - [info] Alive Slaves:
Sun Sep 28 10:17:50 2014 - [info] 192.168.153.151(192.168.153.151:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled
Sun Sep 28 10:17:50 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:50 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Sep 28 10:17:50 2014 - [info] 192.168.153.152(192.168.153.152:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled
Sun Sep 28 10:17:50 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:50 2014 - [info] Not candidate for the new Master (no_master is set)
Sun Sep 28 10:17:50 2014 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Sep 28 10:17:50 2014 - [info]
#对原Master做master_ip_failover_script和shutdown_script的操作
Sun Sep 28 10:17:50 2014 - [info] * Phase 2: Dead Master Shutdown Phase..
Sun Sep 28 10:17:50 2014 - [info]
Sun Sep 28 10:17:50 2014 - [info] Forcing shutdown so that applications never connect to the current master..
Sun Sep 28 10:17:50 2014 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
Sun Sep 28 10:17:50 2014 - [info] Executing SHUTDOWN script:
Sun Sep 28 10:17:50 2014 - [info] /opt/master_ip_failover.sh --command=stopssh --ssh_user=root --host=192.168.153.150 --ip=192.168.153.150 --port=3306
Sun Sep 28 10:17:51 2014 - [info] Power off done.
Sun Sep 28 10:17:51 2014 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sun Sep 28 10:17:51 2014 - [info]
#开始差异日志的恢复:获取slave最后得到的binlog位置
Sun Sep 28 10:17:51 2014 - [info] * Phase 3: Master Recovery Phase..
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] The latest binary log file/position on all slaves is mysql-bin.000026:120
Sun Sep 28 10:17:51 2014 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sun Sep 28 10:17:51 2014 - [info] 192.168.153.151(192.168.153.151:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled
Sun Sep 28 10:17:51 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:51 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Sep 28 10:17:51 2014 - [info] 192.168.153.152(192.168.153.152:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled
Sun Sep 28 10:17:51 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:51 2014 - [info] Not candidate for the new Master (no_master is set)
Sun Sep 28 10:17:51 2014 - [info] The oldest binary log file/position on all slaves is mysql-bin.000026:120
Sun Sep 28 10:17:51 2014 - [info] Oldest slaves:
Sun Sep 28 10:17:51 2014 - [info] 192.168.153.151(192.168.153.151:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled
Sun Sep 28 10:17:51 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:51 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Sep 28 10:17:51 2014 - [info] 192.168.153.152(192.168.153.152:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled
Sun Sep 28 10:17:51 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:51 2014 - [info] Not candidate for the new Master (no_master is set)
Sun Sep 28 10:17:51 2014 - [info]
#获取原master的binlog日志
Sun Sep 28 10:17:51 2014 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost.
Sun Sep 28 10:17:51 2014 - [info]
#确定新的master
Sun Sep 28 10:17:51 2014 - [info] * Phase 3.3: Determining New Master Phase..
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Sun Sep 28 10:17:51 2014 - [info] All slaves received relay logs to the same position. No need to resync each other.
Sun Sep 28 10:17:51 2014 - [info] Searching new master from slaves..
Sun Sep 28 10:17:51 2014 - [info] Candidate masters from the configuration file:
Sun Sep 28 10:17:51 2014 - [info] 192.168.153.151(192.168.153.151:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled
Sun Sep 28 10:17:51 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:51 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Sep 28 10:17:51 2014 - [info] Non-candidate masters:
Sun Sep 28 10:17:51 2014 - [info] 192.168.153.152(192.168.153.152:3306) Version=5.6.20-68.0-log (oldest major version between slaves) log-bin:enabled
Sun Sep 28 10:17:51 2014 - [info] Replicating from 192.168.153.150(192.168.153.150:3306)
Sun Sep 28 10:17:51 2014 - [info] Not candidate for the new Master (no_master is set)
Sun Sep 28 10:17:51 2014 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Sun Sep 28 10:17:51 2014 - [info] New master is 192.168.153.151(192.168.153.151:3306)
Sun Sep 28 10:17:51 2014 - [info] Starting master failover..
Sun Sep 28 10:17:51 2014 - [info]
From:
192.168.153.150 (current master)
+--192.168.153.151
+--192.168.153.152
To:
192.168.153.151 (new master)
+--192.168.153.152
Sun Sep 28 10:17:51 2014 - [info]
#在new master上应用差异的binlog日志
Sun Sep 28 10:17:51 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] * Phase 3.4: Master Log Apply Phase..
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Sun Sep 28 10:17:51 2014 - [info] Starting recovery on 192.168.153.151(192.168.153.151:3306)..
Sun Sep 28 10:17:51 2014 - [info] This server has all relay logs. Waiting all logs to be applied..
Sun Sep 28 10:17:51 2014 - [info] done.
Sun Sep 28 10:17:51 2014 - [info] All relay logs were successfully applied.
#获取new master的binlog位置。
Sun Sep 28 10:17:51 2014 - [info] Getting new master's binlog name and position..
Sun Sep 28 10:17:51 2014 - [info] mysql-bin.000016:688
Sun Sep 28 10:17:51 2014 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.153.151', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000016', MASTER_LOG_POS=688, MASTER_USER='slave', MASTER_PASSWORD='xxx';
#如果有master_ip_failover_script,那么给new master设置VIP
Sun Sep 28 10:17:51 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Sun Sep 28 10:17:51 2014 - [info] ** Finished master recovery successfully.
Sun Sep 28 10:17:51 2014 - [info] * Phase 3: Master Recovery Phase completed.
Sun Sep 28 10:17:51 2014 - [info]
#开始恢复其他slave,也是从原master的binlog对比来做恢复
Sun Sep 28 10:17:51 2014 - [info] * Phase 4: Slaves Recovery Phase..
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] -- Slave diff file generation on host 192.168.153.152(192.168.153.152:3306) started, pid: 8387. Check tmp log /var/log/masterha/app1/192.168.153.152_3306_20140928101750.log if it takes time..
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] Log messages from 192.168.153.152 ...
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Sun Sep 28 10:17:51 2014 - [info] End of log messages from 192.168.153.152.
Sun Sep 28 10:17:51 2014 - [info] -- 192.168.153.152(192.168.153.152:3306) has the latest relay log events.
Sun Sep 28 10:17:51 2014 - [info] Generating relay diff files from the latest slave succeeded.
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] -- Slave recovery on host 192.168.153.152(192.168.153.152:3306) started, pid: 8389. Check tmp log /var/log/masterha/app1/192.168.153.152_3306_20140928101750.log if it takes time..
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] Log messages from 192.168.153.152 ...
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] Starting recovery on 192.168.153.152(192.168.153.152:3306)..
Sun Sep 28 10:17:51 2014 - [info] This server has all relay logs. Waiting all logs to be applied..
Sun Sep 28 10:17:51 2014 - [info] done.
Sun Sep 28 10:17:51 2014 - [info] All relay logs were successfully applied.
#差异日志应用完成以后,切换所有slave到new master。
Sun Sep 28 10:17:51 2014 - [info] Resetting slave 192.168.153.152(192.168.153.152:3306) and starting replication from the new master 192.168.153.151(192.168.153.151:3306)..
Sun Sep 28 10:17:51 2014 - [info] Executed CHANGE MASTER.
Sun Sep 28 10:17:51 2014 - [info] Slave started.
Sun Sep 28 10:17:51 2014 - [info] End of log messages from 192.168.153.152.
Sun Sep 28 10:17:51 2014 - [info] -- Slave recovery on host 192.168.153.152(192.168.153.152:3306) succeeded.
Sun Sep 28 10:17:51 2014 - [info] All new slave servers recovered successfully.
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] * Phase 5: New master cleanup phase..
Sun Sep 28 10:17:51 2014 - [info]
Sun Sep 28 10:17:51 2014 - [info] Resetting slave info on the new master..
Sun Sep 28 10:17:51 2014 - [info] 192.168.153.151: Resetting slave info succeeded.
Sun Sep 28 10:17:51 2014 - [info] Master failover to 192.168.153.151(192.168.153.151:3306) completed successfully.
Sun Sep 28 10:17:51 2014 - [info]
#failover操作完成,生成failover报告
----- Failover Report -----
app1: MySQL Master failover 192.168.153.150 to 192.168.153.151 succeeded
Master 192.168.153.150 is down!
Check MHA Manager logs at localhost.localdomain:/var/log/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
Power off 192.168.153.150.
The latest slave 192.168.153.151(192.168.153.151:3306) has all relay logs for recovery.
Selected 192.168.153.151 as a new master.
192.168.153.151: OK: Applying all logs succeeded.
192.168.153.152: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.153.152: OK: Applying all logs succeeded. Slave started, replicating from 192.168.153.151.
192.168.153.151: Resetting slave info succeeded.
Master failover to 192.168.153.151(192.168.153.151:3306) completed