從不同的子網 ping Linux HA 集群的虛擬 IP 不起作用

June 28, 2011

我已經使用 Corosync/Pacemaker 設置了一個 Linux 集群，兩個集群節點位於同一個子網中，共享一個虛擬 IP。對於同一子網內的機器，可以ping通虛擬IP“135.121.192.104”。

但是，如果我嘗試從不同子網的機器 ping 虛擬 IP“135.121.192.104”，那麼它不會響應我的 ping。其他機器位於子網“135.121.196.x”上。

在我的機器上，我的 ifcfg-eth0 文件中有以下子網遮罩：

網路遮罩=255.255.254.0

以下是我的 crm 配置顯示輸出：

[root@h-008 crm]# crm configure show
node h-008 \
       attributes standby="off"
node h-009 \
       attributes standby="off"
primitive GAXClusterIP ocf:heartbeat:IPaddr2 \
       params ip="135.121.192.104" cidr_netmask="23" \
       op monitor interval="30s" clusterip_hash="sourceip"
clone GAXClusterIP2 GAXClusterIP \
       meta globally-unique="true" clone-node-max="2"
property $id="cib-bootstrap-options" \
       dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
       cluster-infrastructure="openais" \
       expected-quorum-votes="2" \
       no-quorum-policy="ignore" \
       stonith-enabled="false"
rsc_defaults $id="rsc-options" \
       resource-stickiness="100"

以及 crm_mon 狀態的輸出：

[root@h-009 crm]# crm_mon status --one-shot
non-option ARGV-elements: status
============
Last updated: Thu Jun 23 08:12:21 2011
Stack: openais
Current DC: h-008 - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ h-008 h-009 ]

Clone Set: GAXClusterIP2 (unique)
    GAXClusterIP:0     (ocf::heartbeat:IPaddr2):       Started h-008
    GAXClusterIP:1     (ocf::heartbeat:IPaddr2):       Started h-009

我是 Linux HA 集群設置的新手，無法找出問題的根本原因。我可以檢查任何配置來診斷此問題嗎？

補充評論：

Below is the output of "route -n"

[root@h-008 crm]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
135.121.192.0   0.0.0.0         255.255.254.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth0
0.0.0.0         135.121.192.1   0.0.0.0         UG    0      0        0 eth0

以下是從集群機器到集群外機器的traceroute輸出：

[root@h-008 crm]# traceroute 135.121.196.122
traceroute to 135.121.196.122 (135.121.196.122), 30 hops max, 40 byte packets
1  135.121.192.1 (135.121.192.1)  6.750 ms  6.967 ms  7.634 ms
2  135.121.205.225 (135.121.205.225)  12.296 ms  14.385 ms  16.101 ms
3  s2h-003.hpe.test.com (135.121.196.122)  0.172 ms  0.170 ms  0.170 ms

下面是從集群外的機器到虛擬 IP 135.121.192.104 的 traceroute 輸出：

[root@s2h-003 ~]# traceroute 135.121.192.104
traceroute to 135.121.192.104 (135.121.192.104), 30 hops max, 40 byte packets
1  135.121.196.1 (135.121.196.1)  10.558 ms  10.895 ms  11.556 ms
2  135.121.205.226 (135.121.205.226)  11.016 ms  12.797 ms  14.152 ms
3  * * *
4  * * *
5  * * *
6  * * *
7  * * *
8  *

但是當我嘗試對其中一個節點的集群的真實 IP 地址進行跟踪路由時，跟踪路由是成功的，即：

[root@s2h-003 ~]# traceroute 135.121.192.102
traceroute to 135.121.192.102 (135.121.192.102), 30 hops max, 40 byte packets
1  135.121.196.1 (135.121.196.1)  4.994 ms  5.315 ms  5.951 ms
2  135.121.205.226 (135.121.205.226)  3.816 ms  6.016 ms  7.158 ms
3  h-009.msite.pr.hpe.test.com (135.121.192.102)  0.236 ms  0.229 ms  0.216 ms

您錯誤地假設您的集群配置與您看到的問題有關，只是因為它對您來說是一個新領域。集群軟體所做的只是管理（和監控）資源，在本例中是一個 IP 地址，它將在集群中的主機上配置。您可以輕鬆地刪除整個集群配置並將 IP 地址添加到其中一個節點上，您將看到完全相同的問題。
顯然，如果您可以從同一網路訪問 IP 但不能從另一個網路訪問 IP，則存在路由問題。檢查您的路由器配置。
順便說一句，在集群中禁用 stonith 是導致數據失去或損壞的一種方式。我希望你只在測試期間禁用它。

引用自：https://serverfault.com/questions/283363

從不同的子網 ping Linux HA 集群的虛擬 IP 不起作用

相關問答

Heartbeat、Pacemaker 和 CoroSync 的替代品？

在最新的 Centos 6 中找不到 crm 命令（起搏器的集群管理）

如何禁止在故障轉移數據中心啟動 Heartbeat 資源？

起搏器故障超時不重置故障計數

Linux HA 集群：以非 root 使用者身份執行資源

Corosync 2.3.3：無法啟用密碼