具有兩個網路介面的故障轉移起搏器集群？

September 16, 2020

所以，我在一個 vlan 中有兩台測試伺服器。

srv1
 eth1 10.10.10.11
 eth2 10.20.10.11

srv2
 eth1 10.10.10.12
 eth2 10.20.10.12

Cluster VIP - 10.10.10.100

具有兩個介面的 Corosync 配置：

 rrp_mode: passive

 interface {
   ringnumber: 0
   bindnetaddr: 10.10.10.0
   mcastaddr: 226.94.1.1
   mcastport: 5405
 }

 interface {
   ringnumber: 1
   bindnetaddr: 10.20.10.0
   mcastaddr: 226.94.1.1
   mcastport: 5407
 }

起搏器配置：

# crm configure show
node srv1
node srv2
primitive cluster-ip ocf:heartbeat:IPaddr2 \
   params ip="10.10.10.100" cidr_netmask="24" \
   op monitor interval="5s"
primitive ha-nginx lsb:nginx \
   op monitor interval="5s"
location prefer-srv-2 ha-nginx 50: srv2
colocation nginx-and-cluster-ip +inf: ha-nginx cluster-ip
property $id="cib-bootstrap-options" \
   dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
   cluster-infrastructure="openais" \
   expected-quorum-votes="2" \
   no-quorum-policy="ignore" \
   stonith-enabled="false"

地位：

# crm status
============
Last updated: Thu Jan 29 13:40:16 2015
Last change: Thu Jan 29 12:47:25 2015 via crmd on srv1
Stack: openais
Current DC: srv2 - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ srv1 srv2 ]

cluster-ip (ocf::heartbeat:IPaddr2):   Started srv2
ha-nginx   (lsb:nginx):    Started srv2

戒指：

# corosync-cfgtool -s
Printing ring status.
Local node ID 185207306
RING ID 0
   id  = 10.10.10.11
   status  = ring 0 active with no faults
RING ID 1
   id  = 10.20.10.11
   status  = ring 1 active with no faults

而且，如果我這樣做srv2# ifconfig eth1 down了，起搏器仍然在 eth2 上工作，沒關係。但是nginx 在 10.10.10.100 上不可用（因為 eth1 關閉了，是的），pacemeker說，一切正常。

但是，我希望在 eth1 在 srv2 上死掉之後，nginx 移動到 srv1。

那麼，我能做些什麼呢？

所以，感謝@Dok，我用 ocf:pacemaker:ping 解決了我的問題。

# crm configure show
node srv1
node srv2
primitive P_INTRANET ocf:pacemaker:ping \
 params host_list="10.10.10.11 10.10.10.12" multiplier="100" name="ping_intranet" \
 op monitor interval="5s" timeout="5s"
primitive cluster-ip ocf:heartbeat:IPaddr2 \
 params ip="10.10.10.100" cidr_netmask="24" \
 op monitor interval="5s"
primitive ha-nginx lsb:nginx \
 op monitor interval="5s"
clone CL_INTRANET P_INTRANET \
 meta globally-unique="false"
location L_CLUSTER_IP_PING_INTRANET cluster-ip \
 rule $id="L_CLUSTER_IP_PING_INTRANET-rule" ping_intranet: defined ping_intranet
location L_HA_NGINX_PING_INTRANET ha-nginx \
 rule $id="L_HA_NGINX_PING_INTRANET-rule" ping_intranet: defined ping_intranet
location L_INTRANET_01 CL_INTRANET 100: srv1
location L_INTRANET_02 CL_INTRANET 100: srv2
colocation nginx-and-cluster-ip 1000: ha-nginx cluster-ip
property $id="cib-bootstrap-options" \
 dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
 cluster-infrastructure="openais" \
 expected-quorum-votes="2" \
 no-quorum-policy="ignore" \
 stonith-enabled="false"

ocf:pacemaker:pingd 資源專門設計用於在失去連接時對節點進行故障轉移。您可以在此處的集群實驗室 wiki 上找到一個非常簡短的範例：http: //clusterlabs.org/wiki/Example_configurations#Set_up_pingd
有點不相關，但我過去曾看到過ifconfig down用於測試連接失去的問題。我強烈建議您改為使用 iptables 來丟棄流量以測試連接失去。

引用自：https://serverfault.com/questions/663394

具有兩個網路介面的故障轉移起搏器集群？

相關問答

由於未知主機，pcs 創建集群失敗

節點切換後 Pacemaker 不啟動服務

PCSD 簡單主/從不會故障主切換

沒有流量通過 openstack 浮動 IP 和起搏器/corosync 到達

負載均衡器設置中的奇怪節點號

如何在 CentOS 7 上徹底移除起搏器、corosync 和 pc？