Pacemaker - 斷開介面後集群不會傳遞到另一個節點

July 1, 2017

我在 Corosync + Pacemaker 中有下一個場景
節點1：
eth0：10.143.0.21/24
eth1：10.10.10.1/30（Corosync 通信）
eth2：192.168.5.2/24
節點2：
eth0：10.143.0.22/24
eth1：10.10.10.2/30（Corosync 通信）
eth2：192.168.5.3/24
浮動IP
eth0：10.143.0.23/24
eth2：192.168.5.1/24
介面 eth1 僅用於 corosync 通信。
例如，我從介面 eth0 斷開了網線但沒有任何反應，其他範例我從介面 eth2 斷開了網線，我得到了相同的結果，但我從介面 eth1 斷開了網線（corosync 通信）並且浮動 IP 傳遞到其他節點。
斷開任何介面時如何將資源傳遞給另一個節點？
問候
更新
我使用以下設置進行了測試
crm configure primitive PING-WAN ocf:pacemaker:ping params host_list="10.143.0.1" multiplier="1000" dampen="1s" op monitor interval="1s"
crm configure primitive Failover-WAN ocf:heartbeat:IPaddr2 params ip=10.143.0.23 nic=eth0 op monitor interval=10s meta is-managed=true
crm configure primitive Failover-LAN ocf:heartbeat:IPaddr2 params ip=192.168.5.1 nic=eth2 op monitor interval=10s meta is-managed=true
crm configure group Cluster Failover-WAN Failover-LAN
crm configure location Best_Connectivity Cluster rule pingd: defined pingd
它對我有用，當從 eth0 斷開網路電纜並失去對目標 10.143.0.1（網關）的 ping 時，資源被移動到另一個節點，但我的場景是 3 個介面，所以我決定添加一個 ping 測試更多
crm configure primitive PING-LAN ocf:pacemaker:ping params host_list="192.168.5.4" multiplier="1000" dampen="1s" op monitor interval="1s"
但是現在必須斷開與兩台主機（10.143.0.1 和 192.168.5.4）的連接，以便將資源移動到另一個節點。
我正在尋找資訊，但我無法使以下場景起作用：
如果節點失去與任何添加到 ping 測試的主機的連接，則其他資源將傳遞到另一個節點，而無需同時失去所有 ping 測試的連接。

您需要告訴 Pacemaker 您關心介面故障。看ocf:pacemaker:ping資源。您可以使用該資源代理 ping 不同介面網路上的其他主機列表，如果這些 ping 失敗，Pacemaker 將做出反應。
如果您將ocf:pacemaker:ping資源分組或使用約束將它們關聯到您在 Pacemaker 中管理的任何其他內容，它們都會一起移動。
另外，我敢打賭，當您拔掉eth1之前的測試時，IP 並沒有“移動”，而是同時在兩個集群節點上啟動；到集群節點時，他們都認為他們的對等節點失踪了。您實際上是在測試如果集群分區會發生什麼。
關於這一點，您絕對應該按照另一個答案中的建議在 Corosync 配置中配置第二個冗餘環，但這不會產生您想要的效果。
**更新 0：**您應該將兩個 IP 添加到同一個ping原語host_list而不是添加額外的ping原語，並將failure_score該原語上的 a 設置為可接受的值。
從ocf:pacemaker:ping資源代理 ( # crm ra info ocf:pacemaker:ping)：
...
failure_score (integer):
Resource is failed if the score is less than failure_score.
Default never fails.

host_list* (string): Host list
A space separated list of ping nodes to count.
...
就像是：# crm configure primitive PING-O-DOOM ocf:pacemaker:ping params host_list="10.143.0.1 192.168.5.4" failure_score="2" op monitor interval="10s"

引用自：https://serverfault.com/questions/858369

Pacemaker - 斷開介面後集群不會傳遞到另一個節點

相關問答

斷開/拔出後如何使節點自動加入集群（Pacemaker，Corosync 2節點設置）？

corosync 和多個負載均衡器

如何在我的 LAN 中測試負載平衡？

由於未知主機，pcs 創建集群失敗

具有兩個網路介面的故障轉移起搏器集群？

節點切換後 Pacemaker 不啟動服務