Corosync :: 區域網路連接問題後重新啟動一些資源

September 1, 2014

我目前正在研究 corosync 來建構一個雙節點集群。所以，我讓它工作正常，它做我想做的事，那就是：

兩個節點之間的連接斷開會為第一個節點“10node”提供兩個故障轉移 Wan IP。（又名資源 WanCluster100 和 WanCluster101 ）
‘11node’ 什麼都不做。他“認為”他仍然擁有他的 Failover Wan IP。（又名 WanCluster101）

但它不這樣做：

當與其他節點的連接恢復時，“11node”應重新啟動 WanCluster101 資源。

這是為了防止 node10 簡單地死掉（因此沒有獲得 11node 的故障轉移 Wan IP）的情況，導致沒有一個節點具有 10node 的故障轉移 IP，因為 10node 已關閉 11node 已“歸還”他的故障轉移 Wan IP。

這是我正在處理的目前配置。

node 10sch \
   attributes standby="off"
node 11sch \
   attributes standby="off"
primitive LanCluster100 ocf:heartbeat:IPaddr2 \
   params ip="172.25.0.100" cidr_netmask="32" nic="eth3" \
   op monitor interval="10s" \
   meta is-managed="true" target-role="Started"
primitive LanCluster101 ocf:heartbeat:IPaddr2 \
   params ip="172.25.0.101" cidr_netmask="32" nic="eth3" \
   op monitor interval="10s" \
   meta is-managed="true" target-role="Started"
primitive Ping100 ocf:pacemaker:ping \
   params host_list="192.0.2.1" multiplier="500" dampen="15s" \
   op monitor interval="5s" \
   meta target-role="Started"
primitive Ping101 ocf:pacemaker:ping \
   params host_list="192.0.2.1" multiplier="500" dampen="15s" \
   op monitor interval="5s" \
   meta target-role="Started"
primitive WanCluster100 ocf:heartbeat:IPaddr2 \
   params ip="192.0.2.100" cidr_netmask="32" nic="eth2" \
   op monitor interval="10s" \
   meta target-role="Started"
primitive WanCluster101 ocf:heartbeat:IPaddr2 \
   params ip="192.0.2.101" cidr_netmask="32" nic="eth2" \
   op monitor interval="10s" \
   meta target-role="Started"
primitive Website0 ocf:heartbeat:apache \
   params configfile="/etc/apache2/apache2.conf" options="-DSSL" \
   operations $id="Website-one" \
   op start interval="0" timeout="40" \
   op stop interval="0" timeout="60" \
   op monitor interval="10" timeout="120" start-delay="0" statusurl="http://127.0.0.1/server-status/" \
   meta target-role="Started"
primitive Website1 ocf:heartbeat:apache \
   params configfile="/etc/apache2/apache2.conf.1" options="-DSSL" \
   operations $id="Website-two" \
   op start interval="0" timeout="40" \
   op stop interval="0" timeout="60" \
   op monitor interval="10" timeout="120" start-delay="0" statusurl="http://127.0.0.1/server-status/" \
   meta target-role="Started"
group All100 WanCluster100 LanCluster100
group All101 WanCluster101 LanCluster101
location AlwaysPing100WithNode10 Ping100 \
   rule $id="AlWaysPing100WithNode10-rule" inf: #uname eq 10sch
location AlwaysPing101WithNode11 Ping101 \
   rule $id="AlWaysPing101WithNode11-rule" inf: #uname eq 11sch
location NeverLan100WithNode11 LanCluster100 \
   rule $id="RAND1083308" -inf: #uname eq 11sch
location NeverPing100WithNode11 Ping100 \
   rule $id="NeverPing100WithNode11-rule" -inf: #uname eq 11sch
location NeverPing101WithNode10 Ping101 \
   rule $id="NeverPing101WithNode10-rule" -inf: #uname eq 10sch
location Website0NeedsConnectivity Website0 \
   rule $id="Website0NeedsConnectivity-rule" -inf: not_defined pingd or pingd lte 0
location Website1NeedsConnectivity Website1 \
   rule $id="Website1NeedsConnectivity-rule" -inf: not_defined pingd or pingd lte 0
colocation Never -inf: LanCluster101 LanCluster100
colocation Never2 -inf: WanCluster100 LanCluster101
colocation NeverBothWebsitesTogether -inf: Website0 Website1
property $id="cib-bootstrap-options" \
   dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
   cluster-infrastructure="openais" \
   expected-quorum-votes="2" \
   no-quorum-policy="ignore" \
   stonith-enabled="false" \
   last-lrm-refresh="1408954702" \
   maintenance-mode="false"
rsc_defaults $id="rsc-options" \
   resource-stickiness="100" \
   migration-threshold="3"

關於這條線，我還有一個不太重要的問題：

colocation NeverBothLans -inf: LanCluster101 LanCluster100

我如何告訴它這種搭配只適用於“11node”。

1：在測試你的集群連通性之前，你需要配置你的stonith設備，stonith在集群中非常重要，解決腦裂的情況2：對於不太重要的問題，你可以嘗試使用位置約束
你可以從這樣的事情開始：
location mycol dummy1 \
       rule $id="myrule" -inf: defined dummy2 and #uname eq suse02

如果我正確理解您的需求，您可以通過放置位置限制來做到這一點：
pcs constraint location WanCluster101 prefers 11sch=10
pcs constraint location WanCluster101 prefers 10sch=5
我過去所做的就是對兩個 IP 都進行約束。因此，當一個節點宕機時，另一個節點同時擁有兩個 IP，無論其中哪個節點宕機，另一個節點都擁有兩個 IP。這導致為每個 IP 添加具有其他優先級的約束（一個在第一個節點上具有較高的優先級，在第二個節點上具有較低的優先級，另一個在第二個節點上具有較高的優先級而在第一個節點上具有較低的優先級）。

引用自：https://serverfault.com/questions/623796

Corosync :: 區域網路連接問題後重新啟動一些資源

相關問答

Heartbeat、Pacemaker 和 CoroSync 的替代品？

起搏器故障超時不重置故障計數

Linux HA 集群：以非 root 使用者身份執行資源

NFS 故障轉移在遷移資源時因文件句柄過時而失敗

corosync 綁定到 127.0.0.1 而不是正確的介面

corosync 和多個負載均衡器