Cluster

具有兩個網路介面的故障轉移起搏器集群?

  • September 16, 2020

所以,我在一個 vlan 中有兩台測試伺服器。

srv1
 eth1 10.10.10.11
 eth2 10.20.10.11

srv2
 eth1 10.10.10.12
 eth2 10.20.10.12

Cluster VIP - 10.10.10.100

具有兩個介面的 Corosync 配置:

 rrp_mode: passive

 interface {
   ringnumber: 0
   bindnetaddr: 10.10.10.0
   mcastaddr: 226.94.1.1
   mcastport: 5405
 }

 interface {
   ringnumber: 1
   bindnetaddr: 10.20.10.0
   mcastaddr: 226.94.1.1
   mcastport: 5407
 }

起搏器配置:

# crm configure show
node srv1
node srv2
primitive cluster-ip ocf:heartbeat:IPaddr2 \
   params ip="10.10.10.100" cidr_netmask="24" \
   op monitor interval="5s"
primitive ha-nginx lsb:nginx \
   op monitor interval="5s"
location prefer-srv-2 ha-nginx 50: srv2
colocation nginx-and-cluster-ip +inf: ha-nginx cluster-ip
property $id="cib-bootstrap-options" \
   dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
   cluster-infrastructure="openais" \
   expected-quorum-votes="2" \
   no-quorum-policy="ignore" \
   stonith-enabled="false"

地位:

# crm status
============
Last updated: Thu Jan 29 13:40:16 2015
Last change: Thu Jan 29 12:47:25 2015 via crmd on srv1
Stack: openais
Current DC: srv2 - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ srv1 srv2 ]

cluster-ip (ocf::heartbeat:IPaddr2):   Started srv2
ha-nginx   (lsb:nginx):    Started srv2

戒指:

# corosync-cfgtool -s
Printing ring status.
Local node ID 185207306
RING ID 0
   id  = 10.10.10.11
   status  = ring 0 active with no faults
RING ID 1
   id  = 10.20.10.11
   status  = ring 1 active with no faults

而且,如果我這樣做srv2# ifconfig eth1 down了,起搏器仍然在 eth2 上工作,沒關係。 但是nginx 在 10.10.10.100 上不可用(因為 eth1 關閉了,是的),pacemeker說,一切正常。

但是,我希望在 eth1 在 srv2 上死掉之後,nginx 移動到 srv1。

那麼,我能做些什麼呢?

所以,感謝@Dok,我用 ocf:pacemaker:ping 解決了我的問題。

# crm configure show
node srv1
node srv2
primitive P_INTRANET ocf:pacemaker:ping \
 params host_list="10.10.10.11 10.10.10.12" multiplier="100" name="ping_intranet" \
 op monitor interval="5s" timeout="5s"
primitive cluster-ip ocf:heartbeat:IPaddr2 \
 params ip="10.10.10.100" cidr_netmask="24" \
 op monitor interval="5s"
primitive ha-nginx lsb:nginx \
 op monitor interval="5s"
clone CL_INTRANET P_INTRANET \
 meta globally-unique="false"
location L_CLUSTER_IP_PING_INTRANET cluster-ip \
 rule $id="L_CLUSTER_IP_PING_INTRANET-rule" ping_intranet: defined ping_intranet
location L_HA_NGINX_PING_INTRANET ha-nginx \
 rule $id="L_HA_NGINX_PING_INTRANET-rule" ping_intranet: defined ping_intranet
location L_INTRANET_01 CL_INTRANET 100: srv1
location L_INTRANET_02 CL_INTRANET 100: srv2
colocation nginx-and-cluster-ip 1000: ha-nginx cluster-ip
property $id="cib-bootstrap-options" \
 dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
 cluster-infrastructure="openais" \
 expected-quorum-votes="2" \
 no-quorum-policy="ignore" \
 stonith-enabled="false"

ocf:pacemaker:pingd 資源專門設計用於在失去連接時對節點進行故障轉移。您可以在此處的集群實驗室 wiki 上找到一個非常簡短的範例:http: //clusterlabs.org/wiki/Example_configurations#Set_up_pingd

有點不相關,但我過去曾看到過ifconfig down用於測試連接失去的問題。我強烈建議您改為使用 iptables 來丟棄流量以測試連接失去。

引用自:https://serverfault.com/questions/663394