Keepalived
為什麼我在keepalived中遇到了腦裂問題?
當我啟動我的 BACKUP keepalived 實例時,它還假定 MASTER 狀態,如下所示:
Mar 28 02:38:05 localhost Keepalived_vrrp[23527]: VRRP_Instance(VI_01) Entering BACKUP STATE Mar 28 02:38:05 localhost Keepalived_vrrp[23527]: VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(10,11)] Mar 28 02:38:05 localhost Keepalived_vrrp[23527]: VRRP_Script(check_haproxy) succeeded Mar 28 02:38:17 localhost Keepalived_vrrp[23527]: VRRP_Instance(VI_01) Transition to MASTER STATE Mar 28 02:38:21 localhost Keepalived_vrrp[23527]: VRRP_Instance(VI_01) Entering MASTER STATE
主配置:
# Script used to check if HAProxy is running vrrp_script check_haproxy { script "/usr/sbin/pidof haproxy" interval 2 } # Virtual interface # The priority specifies the order in which the assigned interface to take over in a failover vrrp_instance VI_01 { state MASTER interface eth0 advert_int 4 unicast_src_ip 10.1.2.50 unicast_peer { 10.1.2.51 } virtual_router_id 51 priority 150 # The virtual ip address shared between the two loadbalancers virtual_ipaddress { 10.1.2.100 } track_script { check_haproxy }
備份配置:
# Script used to check if HAProxy is running vrrp_script check_haproxy { script "/usr/sbin/pidof haproxy" interval 2 } # Virtual interface # The priority specifies the order in which the assigned interface to take over in a failover vrrp_instance VI_01 { state BACKUP advert_int 4 interface eth0 unicast_src_ip 10.1.2.51 unicast_peer { 10.1.2.50 } virtual_router_id 51 priority 100 # The virtual ip address shared between the two loadbalancers virtual_ipaddress { 10.1.2.100 } track_script { check_haproxy } }
然後我繼續檢查這兩個實例是否正在相互交談:
掌握
$ tcpdump -i eth0 'ip proto 112' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 02:48:33.557462 IP host1.novalocal > 10.1.2.51: VRRPv2, Advertisement, vrid 51, prio 101, authtype none, intvl 4s, length 20 02:48:37.558487 IP host1.novalocal > 10.1.2.51: VRRPv2, Advertisement, vrid 51, prio 101, authtype none, intvl 4s, length 20 02:48:41.559496 IP host1.novalocal > 10.1.2.51: VRRPv2, Advertisement, vrid 51, prio 101, authtype none, intvl 4s, length 20
備份
$ tcpdump -i eth0 'ip proto 112' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 02:49:38.269751 IP host2.novalocal > 10.1.2.50: VRRPv2, Advertisement, vrid 51, prio 100, authtype none, intvl 1s, length 20 02:49:39.270461 IP host2.novalocal > 10.1.2.50: VRRPv2, Advertisement, vrid 51, prio 100, authtype none, intvl 1s, length 20 02:49:40.271197 IP host2.novalocal > 10.1.2.50: VRRPv2, Advertisement, vrid 51, prio 100, authtype none, intvl 1s, length 20
關於為什麼 BACKUP 實例無法辨識 MASTER 的任何提示?
更新1:
iptables 結果:
掌握
Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination
備份
Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination
解決方案
原來是防火牆問題。我能夠通過
tcpdump
在目標主機上執行以驗證收到的廣告來驗證這一點。修復防火牆問題後,我現在得到了以前不存在的 vrrp 廣告。以下是在備份主機上執行的:tcpdump -i eth0 src host 10.1.2.50 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 01:06:42.709813 IP 10.1.2.50 > sntstsvmrla2a02.novalocal: VRRPv2, Advertisement, vrid 51, prio 101, authtype none, intvl 1s, length 20 01:06:43.709901 IP 10.1.2.50 > sntstsvmrla2a02.novalocal: VRRPv2, Advertisement, vrid 51, prio 101, authtype none, intvl 1s, length 20
正如您的 tcpdump 所示,兩個系統都嘗試相互交談,但沒有收到任何答案。所以兩者都認為另一個系統已關閉,並且備份完成了它的用途。您需要找出阻礙通信的原因。
僅出於我的環境(作為通用用途有問題),當 SPT 算法重新計算路由(最多 30 秒)時,我的大腦分裂僅用於瞬時核心交換機故障轉移,這可能在核心交換機韌體升級期間發生。
#MASTER track_script { chk_haproxy # 20 points } #BACKUP track_script { chk_haproxy # 10 points chk_ping_core_switch # 10 points # if not core switch -> brain splitted }
在核心交換機恢復正常之前,使用此配置的備份節點沒有資格成為主節點。